# Data Analysis with Pandas

### Warmup Exercises

## Goals

```
By the end of these warmup exercises, you will:

- have a general overview of how the pandas Python library works.

```

## Introduction

Have you ever wanted to do a couple of warmup exercises for the `pandas` Python library?  If so, this is one notebook you're not going to want to miss.

## Imports

- *pandas* [pandas.pydata.org](https://pandas.pydata.org/)

In [1]:
import pandas as pd

## Preparing the Data

### Loading Data

We will define our own Python dictionary to use as test data.

In [6]:
students = {
            'Name': ['Bill', 'Gladys', 'Ethel', 'Elmer', 'Jimmy', 'Martha', 'Delores'],
            'Grades': [98, 89, 99, 87, 90, 83, 99],
            'Gender': ['Male', 'Female', 'Female', 'Male', 'Male', 'Female', 'Female']
            }

...and convert the dictionary into a `pandas DataFrame`.

In [8]:
students_df = pd.DataFrame(students)

## Warmup Exercises

### Display the top 3 rows of the dataset

In [13]:
students_df.head(3)

Unnamed: 0,Name,Grades,Gender
0,Bill,98,Male
1,Gladys,89,Female
2,Ethel,99,Female


### Display the last 3 rows of the dataset

In [11]:
students_df.tail(3)

Unnamed: 0,Name,Grades,Gender
4,Jimmy,90,Male
5,Martha,83,Female
6,Delores,99,Female


### Find the shape of the dataset
The number of rows & the number of columns.

In [20]:
students_df.shape # pandas.DataFrame attribute, tuple, (row, col)

print(f'Rows : {students_df.shape[0]}')
print(f'Cols : {students_df.shape[1]}')

Rows : 7
Cols : 3


### Get general information about the dataset

In [None]:
students_df.info()

### Check for null values in the dataset

In [None]:
students_df.isnull().sum() # sum(axis=0) >> col wise | sum(axis=1) >> row wise

### Get overall statistics about the dataset

In [None]:
students_df.describe() # alt: describe(include='all')

### Find unique values from the `Gender` column

In [None]:
students_df['Gender'].unique()

### Find the number of unique values from the `Gender` column

In [30]:
students_df['Gender'].nunique() # nunyabizness

2

### Display the count of unique values in the `Gender` column

In [31]:
students_df['Gender'].value_counts()

Female    4
Male      3
Name: Gender, dtype: int64

### Find the total number of students having grades between 90 to 100 (inclusive)

In [None]:
students_df[(students_df['Grades'] >= 90) & (students_df['Grades'] <= 100)] # displays only True returned value
len(students_df[(students_df['Grades'] >= 90) & (students_df['Grades'] <= 100)])

#### Same as above, but using the `between` method

In [37]:
sum(students_df['Grades'].between(90, 100)) # inclusive, sum adds only True

4

### Find average grades

In [38]:
students_df['Grades'].mean()

92.14285714285714

### `Apply` method : calling a user defined function

In [None]:
def grades(x):
  return x // 2 # float division

students_df['Grades'].apply(grades)

#### and add it as a new column

In [None]:
students_df['Half_Grades'] = students_df['Grades'].apply(grades)
students_df

#### Lambda version

In [44]:
students_df['Grades'].apply(lambda X:X//2)

0    49
1    44
2    49
3    43
4    45
5    41
6    49
Name: Grades, dtype: int64

### Map function

Ex: convert 'Male' to 1, 'Female' to 0

In [47]:
students_df['Male_Female'] = students_df['Gender'].map({'Male': 1, 'Female': 0})
students_df

Unnamed: 0,Name,Grades,Gender,Half_Grades,Male_Female
0,Bill,98,Male,49,1
1,Gladys,89,Female,44,0
2,Ethel,99,Female,49,0
3,Elmer,87,Male,43,1
4,Jimmy,90,Male,45,1
5,Martha,83,Female,41,0
6,Delores,99,Female,49,0


### Drop columns

In [None]:
students_df

In [None]:
students_df.drop('Male_Female', axis=1) # axis=1 >> col

#### and for multiple cols

In [None]:
students_df.drop(['Male_Female', 'Half_Grades'], axis=1, inplace=True) # have to pass them as a list; inplace=True changes the original DF

In [54]:
students_df

Unnamed: 0,Name,Grades,Gender
0,Bill,98,Male
1,Gladys,89,Female
2,Ethel,99,Female
3,Elmer,87,Male
4,Jimmy,90,Male
5,Martha,83,Female
6,Delores,99,Female


### Print column names

In [57]:
students_df.columns # DF attribute

Index(['Name', 'Grades', 'Gender'], dtype='object')

### Sort the dataset *as per* the `Grades` column

In [59]:
students_df.sort_values(by='Grades') # asc order by default(ascending=True)

Unnamed: 0,Name,Grades,Gender
5,Martha,83,Female
3,Elmer,87,Male
1,Gladys,89,Female
4,Jimmy,90,Male
0,Bill,98,Male
2,Ethel,99,Female
6,Delores,99,Female


### Display the name and grades of the female students

In [63]:
students_df[students_df['Gender'] == 'Female'][['Name', 'Grades']]

Unnamed: 0,Name,Grades
1,Gladys,89
2,Ethel,99
5,Martha,83
6,Delores,99


You could also do this with the `isin` method.  Keep track of your brackets though.

In [64]:
students_df[students_df['Gender'].isin(['Female'])][['Name', 'Grades']]

Unnamed: 0,Name,Grades
1,Gladys,89
2,Ethel,99
5,Martha,83
6,Delores,99
