# Pandas

Pandas is the one of the most powerful and important library in the python programming language. It is built on top of Numpy and provides datastructure and data analysis tools for python programming language. 

## Why Pandas

- Simple to use
- Integrated with many other data science & ML python tools 
- Helps you get your data ready for machine learning.
- Manipulate our data as the way we needed.

In [1]:
import pandas as pd

### Data Structures 

In [2]:
# Series - 1D arrays is known as series 
# It take python list as input
s1 = pd.Series([1,2,3])
s1

0    1
1    2
2    3
dtype: int64

In [4]:
# DataFrame - 2D matrices or table is known as Dataframe. DataFrame are far more common than series
# It takes python Dictionary as input
car_make = ['Honda', 'Tesla', 'Nisaan']
car_colour = ['Red', 'Blue', 'Green']
df = pd.DataFrame({"Car make": car_make, "Colour" : car_colour})
df

Unnamed: 0,Car make,Colour
0,Honda,Red
1,Tesla,Blue
2,Nisaan,Green


## Importing a CSV file

In [6]:
# CSV stands for comma seperated values. It is the common way of data storage and pandas works so good with CSVs
car_sales = pd.read_csv('car-sales.csv')
car_sales

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price
0,Toyota,White,150043,4,"$4,000.00"
1,Honda,Red,87899,4,"$5,000.00"
2,Toyota,Blue,32549,3,"$7,000.00"
3,BMW,Black,11179,5,"$22,000.00"
4,Nissan,White,213095,4,"$3,500.00"
5,Toyota,Green,99213,4,"$4,500.00"
6,Honda,Blue,45698,4,"$7,500.00"
7,Honda,Blue,54738,4,"$7,000.00"
8,Toyota,White,60000,4,"$6,250.00"
9,Nissan,White,31600,4,"$9,700.00"


## Anatomy Of a DataFrame
<img src="pandas-anatomy-of-a-dataframe.png" />

## Exporting a dataframe

In [7]:
car_sales.to_csv("new_car_sales.csv") # this will save the dataframe to your disk in the CSV format

## Describing Data with Pandas

In [8]:
car_sales = pd.read_csv('car-sales.csv')
car_sales

Unnamed: 0,Make,Colour,Odometer (KM),Doors,Price
0,Toyota,White,150043,4,"$4,000.00"
1,Honda,Red,87899,4,"$5,000.00"
2,Toyota,Blue,32549,3,"$7,000.00"
3,BMW,Black,11179,5,"$22,000.00"
4,Nissan,White,213095,4,"$3,500.00"
5,Toyota,Green,99213,4,"$4,500.00"
6,Honda,Blue,45698,4,"$7,500.00"
7,Honda,Blue,54738,4,"$7,000.00"
8,Toyota,White,60000,4,"$6,250.00"
9,Nissan,White,31600,4,"$9,700.00"


In [14]:
car_sales.describe()   # describe method gives some stasticals operations on the dataframe 

Unnamed: 0,Odometer (KM),Doors
count,10.0,10.0
mean,78601.4,4.0
std,61983.471735,0.471405
min,11179.0,3.0
25%,35836.25,4.0
50%,57369.0,4.0
75%,96384.5,4.0
max,213095.0,5.0


In [12]:
car_sales.columns    # gives the names of all coloums 

Index(['Make', 'Colour', 'Odometer (KM)', 'Doors', 'Price'], dtype='object')

In [15]:
car_sales.index      # gives the index of the dataframe rows 

RangeIndex(start=0, stop=10, step=1)

In [16]:
car_sales.dtypes     # Return the data type of all the columns 

Make             object
Colour           object
Odometer (KM)     int64
Doors             int64
Price            object
dtype: object

In [20]:
car_sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Make           10 non-null     object
 1   Colour         10 non-null     object
 2   Odometer (KM)  10 non-null     int64 
 3   Doors          10 non-null     int64 
 4   Price          10 non-null     object
dtypes: int64(2), object(3)
memory usage: 528.0+ bytes


In [24]:
car_sales.mean()   # gives the mean of all numerical columns 

  car_sales.mean()   # gives the mean of all numerical columns


Odometer (KM)    78601.4
Doors                4.0
dtype: float64

In [23]:
car_sales.sum()     # add the numeric column and concat the object or strings

Make             ToyotaHondaToyotaBMWNissanToyotaHondaHondaToyo...
Colour               WhiteRedBlueBlackWhiteGreenBlueBlueWhiteWhite
Odometer (KM)                                               786014
Doors                                                           40
Price            $4,000.00$5,000.00$7,000.00$22,000.00$3,500.00...
dtype: object

## Selecting the particular Columns and Rows in DataFrame

In [25]:
car_sales["Doors"]

0    4
1    4
2    3
3    5
4    4
5    4
6    4
7    4
8    4
9    4
Name: Doors, dtype: int64

In [26]:
car_sales["Price"]

0     $4,000.00
1     $5,000.00
2     $7,000.00
3    $22,000.00
4     $3,500.00
5     $4,500.00
6     $7,500.00
7     $7,000.00
8     $6,250.00
9     $9,700.00
Name: Price, dtype: object

In [29]:
car_sales.Make

0    Toyota
1     Honda
2    Toyota
3       BMW
4    Nissan
5    Toyota
6     Honda
7     Honda
8    Toyota
9    Nissan
Name: Make, dtype: object

In [31]:
car_sales['Doors'].sum()

40

In [32]:
len(car_sales)       # basically gives the number of rows in Dataframe

10

In [34]:
car_sales.count()    # give the count of non-NA values 

Make             10
Colour           10
Odometer (KM)    10
Doors            10
Price            10
dtype: int64