Skip to content

alshubati99/Data-Analysis-Python_MLSA

Repository files navigation

Prepare your data using Python and VS Code

Workshop Hosted by Alpha MLSA Himmatsingh Bhagwanisingh deora and Cohosted by Beta MLSA Khawlah Alshubati

Workshop Link

What is MLSA?

With Microsoft Learn Student Ambassadors Program you can amplify your impact and establish yourself as a mentor and leader in your community while developing the technical skills and emotional intelligence you need to succeed. Application Link

Module Source

Manipulate and clean data in Python

Goals

In this workshop, you will learn how to use Python, and popular libraries like NumPy and pandas, to manipulate and clean data to prepare it for analysis.

Goal Description
What will you learn How to find information about, clean, and prepare data that's stored in a pandas DataFrame.
What you'll need Visual Studio Code environment set up to run Python and Jupyter notebooks
Duration 1 hr 20 min
Just want to try the app or see the solution? Solution

Pre-Learning

Prerequisites

What you will learn

Say you want to perform some analysis on a dataset that you find interesting -- like the squirrel population of Central Park, or various types of French cheese. The first thing you'll need to do with any dataset is to clean it up. Many datasets have missing information, or won't be formatted in the exact way you'd like. In this workshop, you will learn how to use data science libraries to prepare your data for analysis and visualization.

image of completed project

Introduction

In this section, you'll review an introduction and make sure that your data science environment is set up correctly before continuing on to the next part of the workshop.

Explore DataFrame information

Next, you will learn how to use Python libraries to explore an iconic dataset. You will be able to understand how to use pandas DataFrames to get an immediate idea about the size, shape, and content of a particular dataset.

Work with missing data

Now that you know how to get an overall sense of the dataset you are working with, you will learn how to identify and deal with missing values.

Remove duplicate data

Another common thing you'll have to do with most datasets you encounter is remove duplicate data. In this section of the workshop, you will learn how to use pandas to detect and remove duplicate entries.

Combine datasets

Sometimes, you will need to combine datasets together. Luckily, there are several methods available in pandas to merge and join datasets.

Exploratory statistics and visualization

So far, you've learned how to use pandas methods to examine some aspects of a DataFrame, and fill in, remove, and combine data. The final way we will seek to understand our data is by creating visualizations.

Next steps

Practice

To test your knowledge, try downloading a free dataset from Kaggle that you find interesting. Use the techniques that you learned in this workshop to manipulate and clean your data!

Feedback

Send us your feedback about this workshop here