# Pandas in Python

<hr>

Prerequisite :
Python Programming Language

    Dictionaries
    Lists
    Tuples
    Sets
    etc.

Pandas (<b>PAN</b>el <b>DA</b>ta), a python library allows easy and fast data analysis and manipulation tools by providing numerical tables and time series data structures called DataFrames and Series, respectively. 

Pandas was created to do the following : 

    1. Provide data structures that can handle both time and non-time series data.
    
    2. Allow mathematical operations on the data structures, ignoring the meta data of data structures.
    
    3. Use Relational Operations like those found in programming languages like SQL (join, groupby, etc.)
    
    4. Handle missing data

In the below code, We created a pandas dataframe object, a tabular data structure that resembles a spreadsheet like those in excel.

For those familiar with SQL, you can view Dataframe as a SQL table.

The dataframe we created consists of four columns, each with entries of different data types (integer, float, string and boolean)

In [1]:
import pandas as pd


#create data 

states  = ['Texas', 'Rhode Island', 'Nebraska']  #String type

population = [27.86E6, 1.06E6, 1.91E6] #Float type

electoral_votes = [38, 3, 5]

is_west_of_MS = [True, False, True]  #Boolean , MS ~ Mississippi River

#Create and display dataframe

headers = ('State', 'Population', 'Electoral Votes', 'West of Mississippi')

data = (states, population, electoral_votes, is_west_of_MS)

#zip() takes iterables as an argument and returns an iterator, 
#zip() can accept any type of iterable, such as files, lists, tuples, dictionaries, sets, and so on.
#while, dict() is used to convert the iterator into key and value pair.

#inside story : zip() does ('State' : states, 'Population' : population, 'Electoral Votes' : electoral_votes, 'West of Mississippi' : is_west_of_MS)
# like this inside the system, (arranging according to corresponding index values)
# while dict() helps to convert this into {key : value} pair
# dict = {'State' : states, 'Population' : population, 'Electoral Votes' : electoral_votes, 'West of Mississippi' : is_west_of_MS}


data_dict = dict(zip(headers, data))

df1 = pd.DataFrame(data_dict)

df1

Unnamed: 0,State,Population,Electoral Votes,West of Mississippi
0,Texas,27860000.0,38,True
1,Rhode Island,1060000.0,3,False
2,Nebraska,1910000.0,5,True


In [2]:
print(df1)

          State  Population  Electoral Votes  West of Mississippi
0         Texas  27860000.0               38                 True
1  Rhode Island   1060000.0                3                False
2      Nebraska   1910000.0                5                 True


### Simple example of dict(zip()) use.

keys = ['a', 'b', 'c']

values = [1, 2, 3]

dictionary = dict(zip(keys, values))

print(dictionary)


O : {'a': 1, 'b': 2, 'c': 3}

![](pandas_1.jpeg)

Dataset link : https://archive.ics.uci.edu/ml/machine-learning-databases/autos/

In [3]:
data = pd.read_csv("imports-85.data", na_values = "?", header = None)
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,16,17,18,19,20,21,22,23,24,25
0,3,,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111.0,5000.0,21,27,13495.0
1,3,,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111.0,5000.0,21,27,16500.0
2,1,,alfa-romero,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154.0,5000.0,19,26,16500.0
3,2,164.0,audi,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102.0,5500.0,24,30,13950.0
4,2,164.0,audi,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115.0,5500.0,18,22,17450.0


In [4]:
# type of data is DataFrame
type(data)

pandas.core.frame.DataFrame

In [5]:
#respective column data types can be determined using dataFrame.dtypes

data.dtypes

0       int64
1     float64
2      object
3      object
4      object
5      object
6      object
7      object
8      object
9     float64
10    float64
11    float64
12    float64
13      int64
14     object
15     object
16      int64
17     object
18    float64
19    float64
20    float64
21    float64
22    float64
23      int64
24      int64
25    float64
dtype: object