- ## *`Pandas`:*
    1. Open Source library that built on top of Numpy.
    2. Used in Data Preparation and Cleaning.
    3. Has built-in visualization features.
    4. Work with data from wide variety of sources.

In [3]:
import numpy as np
import pandas as pd

<img src='ds.png'>

- ### *`Series : ` Single-Column Data that represent  single variable and information of a single type.*
- ### *`Data Frame : ` Multi-Column Data that each column represents a different variable , each column has its own data type and the information can be hetergeneous.*

### *`Series` -> pd.Series(data=.., index=.., dtype=.., name=..., copy=..)*

In [5]:
# From List to Series
chars = ['G', 'E', 'E', 'K', 'S', 'F', 'O', 'R', 'G', 'E', 'E', 'K', 'S']

ser1 = pd.Series(data=chars)

ser1

0     G
1     E
2     E
3     K
4     S
5     F
6     O
7     R
8     G
9     E
10    E
11    K
12    S
dtype: object

In [None]:
# You can Change the Indics.
ser2 = pd.Series(data=chars, index=np.arange(13, 26))

ser2

13    G
14    E
15    E
16    K
17    S
18    F
19    O
20    R
21    G
22    E
23    E
24    K
25    S
dtype: object

In [9]:
# Put a Name for the Series
ser3 = pd.Series(chars, name='Characters')

ser3

0     G
1     E
2     E
3     K
4     S
5     F
6     O
7     R
8     G
9     E
10    E
11    K
12    S
Name: Characters, dtype: object

In [12]:
# From Dict to Series
chars = {'G': 2, 'E': 4, 'K': 2, 'S': 2, 'F': 1, 'O': 1, 'R': 1}

# As Dictionary has Key:Value Pairs -> The Series make the Keys as the Indics and Values as Column Values
ser4 = pd.Series(chars, name='Chars')
ser4

G    2
E    4
K    2
S    2
F    1
O    1
R    1
Name: Chars, dtype: int64

*`Indexing and Slicing : ` You can deal with numeric indeics or the indices you made*

In [None]:
ser4['G']   # Access the Column Values with Index= 'G'

np.int64(2)

In [None]:
# If the Indices are not numeric -> The Stop value is INCLUSIVE
# [Start: Stop]
ser4['G': 'S']

G    2
E    4
K    2
S    2
Name: Chars, dtype: int64

In [14]:
ser3

0     G
1     E
2     E
3     K
4     S
5     F
6     O
7     R
8     G
9     E
10    E
11    K
12    S
Name: Characters, dtype: object

In [15]:
ser3[5]

'F'

In [None]:
# If the Indices are numeric -> The Stop value is EXCLUSIVE
ser3[0:4]
# Access rows from index = 0 to index = 3   (4 Values)

0    G
1    E
2    E
3    K
Name: Characters, dtype: object

In [19]:
# [Start: Stop: Step]
ser3[1:5:2]

1    E
3    K
Name: Characters, dtype: object

*`Arithmetis Operations`*

In [25]:
ser1

0     G
1     E
2     E
3     K
4     S
5     F
6     O
7     R
8     G
9     E
10    E
11    K
12    S
dtype: object

In [26]:
ser3

0     G
1     E
2     E
3     K
4     S
5     F
6     O
7     R
8     G
9     E
10    E
11    K
12    S
Name: Characters, dtype: object

In [None]:
# 2 different series with common indices.
# Add the values with common index -> String Concatenation (In this case)
ser3 + ser1

0     GG
1     EE
2     EE
3     KK
4     SS
5     FF
6     OO
7     RR
8     GG
9     EE
10    EE
11    KK
12    SS
dtype: object

In [27]:
s1 = pd.Series([1, 2, 3, 4], index=['A', 'B', 'C', 'D'])

s2 = pd.Series([1, 2, 3, 4], index=['A', 'C', 'E', 'F'])

s1 + s2

A    2.0
B    NaN
C    5.0
D    NaN
E    NaN
F    NaN
dtype: float64

*Find the common indices and Add values of them or anything other tahn addition*

In [28]:
s1 * s2

A    1.0
B    NaN
C    6.0
D    NaN
E    NaN
F    NaN
dtype: float64

In [29]:
s1 - s2

A    0.0
B    NaN
C    1.0
D    NaN
E    NaN
F    NaN
dtype: float64

In [30]:
s1 / s2

A    1.0
B    NaN
C    1.5
D    NaN
E    NaN
F    NaN
dtype: float64

In [32]:
s1 // s2

A    1.0
B    NaN
C    1.0
D    NaN
E    NaN
F    NaN
dtype: float64

In [31]:
s1 ** s2

A    1.0
B    NaN
C    9.0
D    NaN
E    NaN
F    NaN
dtype: float64

### *`Data Frames` -> od.DataFrame(data=.., index=.., columns=.., dtype=.., copy=..)*

<img src='df.png'>

*U have the Flexibility to change the column names or index names and data type of input values*

In [None]:
# Create a DF From Numpy Array
pd.DataFrame(np.arange(12).reshape(3, 4), columns=['P', 'Q', 'R', 'S'])

Unnamed: 0,P,Q,R,S
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11


In [None]:
# Create a DF from Nested Lists
pd.DataFrame([[2, 3], [4, 5], [1, 6]], columns=['max_speed', 'shield'], index=['A', 'B', 'C'])

Unnamed: 0,max_speed,shield
A,2,3
B,4,5
C,1,6


In [36]:
# Create a DF From Dictionary

fruits = {'Oranges': [3, 2, 1, 0],
        'Apples': [0, 3, 7, 2],
        'Grapes': [5, 6, 9, 0],
        'Pear': [1, 23, 45, 1]
        }

df = pd.DataFrame(fruits, index=['June', 'July', 'Augast', 'September'])
df

Unnamed: 0,Oranges,Apples,Grapes,Pear
June,3,0,5,1
July,2,3,6,23
Augast,1,7,9,45
September,0,2,0,1


*`Indexing & Slicing of DataFrames -> You have a 2 Points of References [Rows: Columns] -> [Start:Stop:Step, Start:Stop:Step]` -> .iloc[] vs. loc[]*

In [37]:
# df.loc[] -> Select Rows/ Columns by Names or Labels.
# df.iloc[] -> Select Rows/ Columns by Position.

In [None]:
# Select Single Row
print(df.iloc[1])      # Row 2

df.loc['July']          # The Same Row using Label/Name

Oranges     2
Apples      3
Grapes      6
Pear       23
Name: July, dtype: int64


Oranges     2
Apples      3
Grapes      6
Pear       23
Name: July, dtype: int64

In [None]:
# Select Single Column
print(df.iloc[:, 1])    # Access all rows and Column 0

df.loc[:, 'Apples']     # The Same Column using Name

June         0
July         3
Augast       7
September    2
Name: Apples, dtype: int64


June         0
July         3
Augast       7
September    2
Name: Apples, dtype: int64

In [45]:
# Select Multiple Rows
print(df.iloc[[1, 3]])      # Access Row 1 & Row3

df.loc[['July', 'September']]

           Oranges  Apples  Grapes  Pear
July             2       3       6    23
September        0       2       0     1


Unnamed: 0,Oranges,Apples,Grapes,Pear
July,2,3,6,23
September,0,2,0,1


In [48]:
# Select Multiple Columns
print(df.iloc[:, [0, -1]])      # Access 1st and Last Column

df.loc[:, ['Oranges', 'Pear']]

           Oranges  Pear
June             3     1
July             2    23
Augast           1    45
September        0     1


Unnamed: 0,Oranges,Pear
June,3,1
July,2,23
Augast,1,45
September,0,1


In [None]:
# Select Rows Range -> Slicing
print(df.iloc[1: 4])    # Access 3 Rows from index=1 to index=3

df.loc['July': 'September']     # Stop here is Inclusive

           Oranges  Apples  Grapes  Pear
July             2       3       6    23
Augast           1       7       9    45
September        0       2       0     1


Unnamed: 0,Oranges,Apples,Grapes,Pear
July,2,3,6,23
Augast,1,7,9,45
September,0,2,0,1


In [52]:
# Select Columns Range
print(df.iloc[:, 1: 4])

df.loc[:, 'Apples': 'Pear']

           Apples  Grapes  Pear
June            0       5     1
July            3       6    23
Augast          7       9    45
September       2       0     1


Unnamed: 0,Apples,Grapes,Pear
June,0,5,1
July,3,6,23
Augast,7,9,45
September,2,0,1


In [54]:
# U Can Apply Conditions
df.loc[df['Apples'] > 2]

Unnamed: 0,Oranges,Apples,Grapes,Pear
July,2,3,6,23
Augast,1,7,9,45


In [55]:
df.loc[df['Apples'] > 2, 'Oranges']

July      2
Augast    1
Name: Oranges, dtype: int64

In [56]:
# U Can use Lambda Function
df.loc[lambda df: df['Apples'] == 3]

Unnamed: 0,Oranges,Apples,Grapes,Pear
July,2,3,6,23


*`Filtering`*

In [57]:
df

Unnamed: 0,Oranges,Apples,Grapes,Pear
June,3,0,5,1
July,2,3,6,23
Augast,1,7,9,45
September,0,2,0,1


In [59]:
df.loc[df['Oranges']> 1]

Unnamed: 0,Oranges,Apples,Grapes,Pear
June,3,0,5,1
July,2,3,6,23


In [60]:
df[df['Apples'] == 3]

Unnamed: 0,Oranges,Apples,Grapes,Pear
July,2,3,6,23


In [None]:
# df.query(Query_Expression)
df.query("Pear > 1 ")

Unnamed: 0,Oranges,Apples,Grapes,Pear
July,2,3,6,23
Augast,1,7,9,45


In [64]:
# Add 2 or more conditions -> AND/ OR
df.loc[(df['Apples'] > 2) & (df['Grapes'] >= 0)]

Unnamed: 0,Oranges,Apples,Grapes,Pear
July,2,3,6,23
Augast,1,7,9,45


### *`Analyzing DataFrames : ` (head, tail, info, describe)*

In [65]:
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['A', 'B', 'C'], index=['X', 'Y', 'Z'] )

df

Unnamed: 0,A,B,C
X,1,2,3
Y,4,5,6
Z,7,8,9


In [66]:
df.head(2)

Unnamed: 0,A,B,C
X,1,2,3
Y,4,5,6


In [67]:
df.tail(1)

Unnamed: 0,A,B,C
Z,7,8,9


In [68]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, X to Z
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       3 non-null      int64
 1   B       3 non-null      int64
 2   C       3 non-null      int64
dtypes: int64(3)
memory usage: 96.0+ bytes


In [69]:
df.describe()

Unnamed: 0,A,B,C
count,3.0,3.0,3.0
mean,4.0,5.0,6.0
std,3.0,3.0,3.0
min,1.0,2.0,3.0
25%,2.5,3.5,4.5
50%,4.0,5.0,6.0
75%,5.5,6.5,7.5
max,7.0,8.0,9.0


In [5]:
df['A'].unique()

array([1, 4, 7])

In [6]:
df['A'].nunique()

3

In [7]:
df.shape

(3, 3)

In [9]:
df.size

9

### *`Load in Dataframes from Files`*

In [None]:
# Load a CSV File -> pd.read_csv(..., delimiter=.., ..etc.)

df = pd.read_csv('data.csv')

# Display First 5 Rows(default) or you can specify the number of rows u want.
df.head()

Unnamed: 0,Make,Model,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,BMW,1 Series M,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,"Factory Tuner,Luxury,High-Performance",Compact,Coupe,26,19,3916,46135
1,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,28,19,3916,40650
2,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500


In [71]:
# Display Last 5 Rows(default) or you can specify the number of rows u want.
df.tail(10)

Unnamed: 0,Make,Model,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
11904,BMW,Z8,2002,premium unleaded (required),394.0,8.0,MANUAL,rear wheel drive,2.0,"Exotic,Luxury,High-Performance",Compact,Convertible,19,12,3916,130000
11905,BMW,Z8,2003,premium unleaded (required),394.0,8.0,MANUAL,rear wheel drive,2.0,"Exotic,Luxury,High-Performance",Compact,Convertible,19,12,3916,131500
11906,Acura,ZDX,2011,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,46020
11907,Acura,ZDX,2011,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,56570
11908,Acura,ZDX,2011,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50520
11909,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,46120
11910,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,56670
11911,Acura,ZDX,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50620
11912,Acura,ZDX,2013,premium unleaded (recommended),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50920
11913,Lincoln,Zephyr,2006,regular unleaded,221.0,6.0,AUTOMATIC,front wheel drive,4.0,Luxury,Midsize,Sedan,26,17,61,28995


In [None]:
# Display Metadata about DataFrame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11914 entries, 0 to 11913
Data columns (total 16 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Make               11914 non-null  object 
 1   Model              11914 non-null  object 
 2   Year               11914 non-null  int64  
 3   Engine Fuel Type   11911 non-null  object 
 4   Engine HP          11845 non-null  float64
 5   Engine Cylinders   11884 non-null  float64
 6   Transmission Type  11914 non-null  object 
 7   Driven_Wheels      11914 non-null  object 
 8   Number of Doors    11908 non-null  float64
 9   Market Category    8172 non-null   object 
 10  Vehicle Size       11914 non-null  object 
 11  Vehicle Style      11914 non-null  object 
 12  highway MPG        11914 non-null  int64  
 13  city mpg           11914 non-null  int64  
 14  Popularity         11914 non-null  int64  
 15  MSRP               11914 non-null  int64  
dtypes: float64(3), int64(5

In [73]:
# Descriptive Statistics
df.describe()       # Default : Numerical Columns

Unnamed: 0,Year,Engine HP,Engine Cylinders,Number of Doors,highway MPG,city mpg,Popularity,MSRP
count,11914.0,11845.0,11884.0,11908.0,11914.0,11914.0,11914.0,11914.0
mean,2010.384338,249.38607,5.628829,3.436093,26.637485,19.733255,1554.911197,40594.74
std,7.57974,109.19187,1.780559,0.881315,8.863001,8.987798,1441.855347,60109.1
min,1990.0,55.0,0.0,2.0,12.0,7.0,2.0,2000.0
25%,2007.0,170.0,4.0,2.0,22.0,16.0,549.0,21000.0
50%,2015.0,227.0,6.0,4.0,26.0,18.0,1385.0,29995.0
75%,2016.0,300.0,6.0,4.0,30.0,22.0,2009.0,42231.25
max,2017.0,1001.0,16.0,4.0,354.0,137.0,5657.0,2065902.0


In [74]:
df.describe(include='object')       # Statistics for Categorical Columns

Unnamed: 0,Make,Model,Engine Fuel Type,Transmission Type,Driven_Wheels,Market Category,Vehicle Size,Vehicle Style
count,11914,11914,11911,11914,11914,8172,11914,11914
unique,48,915,10,5,4,71,3,16
top,Chevrolet,Silverado 1500,regular unleaded,AUTOMATIC,front wheel drive,Crossover,Compact,Sedan
freq,1123,156,7172,8266,4787,1110,4764,3048


*`Drop Rows/Columns` -> df.drop(columns=[.., .., ..], axis=1/0, inplace=True/False)*

In [None]:
# If U want to Drop Columns
df.drop(columns=['Make', 'Model'], axis=1)

Unnamed: 0,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,"Factory Tuner,Luxury,High-Performance",Compact,Coupe,26,19,3916,46135
1,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,28,19,3916,40650
2,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11909,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,46120
11910,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,56670
11911,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50620
11912,2013,premium unleaded (recommended),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50920


In [76]:
# If you want to make changes to orignal DF -> inplace = True
df.drop(columns=['Make', 'Model'], axis=1, inplace=True)

In [None]:
# axis = 0 (default)
df.drop([0, 1])     # Drop Rows 1 & 0

Unnamed: 0,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
2,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500
5,2012,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,31200
6,2012,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,26,17,3916,44100
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11909,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,46120
11910,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,56670
11911,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50620
11912,2013,premium unleaded (recommended),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50920


*`Cleaning Empty Cells` -> df.dropna() OR df.fillna()*

<img src='dropna.png'>

In [78]:
df.dropna()

Unnamed: 0,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,"Factory Tuner,Luxury,High-Performance",Compact,Coupe,26,19,3916,46135
1,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,28,19,3916,40650
2,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11909,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,46120
11910,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,56670
11911,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50620
11912,2013,premium unleaded (recommended),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50920


In [79]:
df.dropna(axis='columns')

Unnamed: 0,Year,Transmission Type,Driven_Wheels,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,2011,MANUAL,rear wheel drive,Compact,Coupe,26,19,3916,46135
1,2011,MANUAL,rear wheel drive,Compact,Convertible,28,19,3916,40650
2,2011,MANUAL,rear wheel drive,Compact,Coupe,28,20,3916,36350
3,2011,MANUAL,rear wheel drive,Compact,Coupe,28,18,3916,29450
4,2011,MANUAL,rear wheel drive,Compact,Convertible,28,18,3916,34500
...,...,...,...,...,...,...,...,...,...
11909,2012,AUTOMATIC,all wheel drive,Midsize,4dr Hatchback,23,16,204,46120
11910,2012,AUTOMATIC,all wheel drive,Midsize,4dr Hatchback,23,16,204,56670
11911,2012,AUTOMATIC,all wheel drive,Midsize,4dr Hatchback,23,16,204,50620
11912,2013,AUTOMATIC,all wheel drive,Midsize,4dr Hatchback,23,16,204,50920


In [80]:
df.dropna(how='all')

Unnamed: 0,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,"Factory Tuner,Luxury,High-Performance",Compact,Coupe,26,19,3916,46135
1,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,28,19,3916,40650
2,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11909,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,46120
11910,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,56670
11911,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50620
11912,2013,premium unleaded (recommended),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50920


In [81]:
df.dropna(thresh=2)

Unnamed: 0,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,"Factory Tuner,Luxury,High-Performance",Compact,Coupe,26,19,3916,46135
1,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,28,19,3916,40650
2,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11909,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,46120
11910,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,56670
11911,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50620
11912,2013,premium unleaded (recommended),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50920


<img src='fillna.png'>

In [82]:
df.fillna(0)   # Fill Empty Cells with 0

Unnamed: 0,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,"Factory Tuner,Luxury,High-Performance",Compact,Coupe,26,19,3916,46135
1,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,28,19,3916,40650
2,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11909,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,46120
11910,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,56670
11911,2012,premium unleaded (required),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50620
11912,2013,premium unleaded (recommended),300.0,6.0,AUTOMATIC,all wheel drive,4.0,"Crossover,Hatchback,Luxury",Midsize,4dr Hatchback,23,16,204,50920


*`Find Duplicates and Remove them`*

<img src='dup.png'>

In [None]:
# How many Duplicated Rows ?
df.duplicated().sum()

np.int64(724)

*`Set & Reset Index`*

*`Group By & Aggegation Methods`*

*`Sorting`*

*`Data Integration`*
- Append
- Concatenate
- Merge
- Join

*`Export DataFrames To CSV File`*