What does Pandas do?
- read data from variety of sources
- Process, Visualize, Wrangle
- Excel for python

# DataFrame

In [1]:
import numpy as np
import pandas as pd
import random

In [17]:
#creating numpy array or list
np.random.seed(100) #this avoids from changing array everytime we refresh
arr = np.random.randint(0, 100, (5, 3))
print(arr)

[[ 8 24 67]
 [87 79 48]
 [10 94 52]
 [98 53 66]
 [98 14 34]]


Upon creating DataFrame from array, Default column names and row indexes starting from 0, 1... is generated

In [18]:
df = pd.DataFrame(arr)
df
#the data frame allots column names and row indexes by itself

Unnamed: 0,0,1,2
0,8,24,67
1,87,79,48
2,10,94,52
3,98,53,66
4,98,14,34


In [20]:
type(df)
#the type of df is dataframe object

pandas.core.frame.DataFrame

You can create your own custom columns and rows as well

In [21]:
rownames = ["Mon", "Tue", "Wed", "Thu", "Fri"]
columnnames = ["Jan", "Feb", "Mar"]

In [23]:
df = pd.DataFrame(arr, index = rownames, columns = columnnames)
df

Unnamed: 0,Jan,Feb,Mar
Mon,8,24,67
Tue,87,79,48
Wed,10,94,52
Thu,98,53,66
Fri,98,14,34


Creating DataFrame from dictionary

In [27]:
mydict = {
    'Jan' : [1, 2, 3, 4, 5],
    'Feb' : [10, 20, 30, 40, 50],
    'Mar' : [15, 25, 35, 45, 55],
}

#dataframe from dict
df = pd.DataFrame(mydict,
                  index = ["Mon", "Tue", "Wed", "Thu", "Fri"],
                  columns = ["Mar", "Jan", "Feb"])

#this adjusts according to column order
df

Unnamed: 0,Mar,Jan,Feb
Mon,15,1,10
Tue,25,2,20
Wed,35,3,30
Thu,45,4,40
Fri,55,5,50


Reading Data from files

Panda supports data in various file formats.
One of the common ones is csv, we can use pd.read_csv to import data

In [38]:
df = pd.read_csv("E:/CODING/GitHub Repositories/PandasCheatSheet/50_states.csv")

df

Unnamed: 0,state,x,y
0,Alabama,139,-77
1,Alaska,-204,-170
2,Arizona,-203,-40
3,Arkansas,57,-53
4,California,-297,13
5,Colorado,-112,20
6,Connecticut,297,96
7,Delaware,275,42
8,Florida,220,-145
9,Georgia,182,-75


To see top 5 rows, use df.head(). Likewise df.tail() for bottom 5 rows

In [39]:
df.head()

Unnamed: 0,state,x,y
0,Alabama,139,-77
1,Alaska,-204,-170
2,Arizona,-203,-40
3,Arkansas,57,-53
4,California,-297,13


In [40]:
df.tail()

Unnamed: 0,state,x,y
45,Virginia,234,12
46,Washington,-257,193
47,West Virginia,200,20
48,Wisconsin,83,113
49,Wyoming,-134,90


Shape of dataframe. For tabular data is number of rows and columns

In [41]:
df.shape

(50, 3)

To get underlying numpy array behind the dataframe, use .values attribute

In [42]:
df.values
#on running this we get numpy array in raw format
#it contains multiple data types mixed into one simple array
#nunmpy cannot support multiple data types in one single array, but pandas can do it through numpy
#so what numpy does is, it converts everything to datatype object and stores it

array([['Alabama', 139, -77],
       ['Alaska', -204, -170],
       ['Arizona', -203, -40],
       ['Arkansas', 57, -53],
       ['California', -297, 13],
       ['Colorado', -112, 20],
       ['Connecticut', 297, 96],
       ['Delaware', 275, 42],
       ['Florida', 220, -145],
       ['Georgia', 182, -75],
       ['Hawaii', -317, -143],
       ['Idaho', -216, 122],
       ['Illinois', 95, 37],
       ['Indiana', 133, 39],
       ['Iowa', 38, 65],
       ['Kansas', -17, 5],
       ['Kentucky', 149, 1],
       ['Louisiana', 59, -114],
       ['Maine', 319, 164],
       ['Maryland', 288, 27],
       ['Massachusetts', 312, 112],
       ['Michigan', 148, 101],
       ['Minnesota', 23, 135],
       ['Mississippi', 94, -78],
       ['Missouri', 49, 6],
       ['Montana', -141, 150],
       ['Nebraska', -61, 66],
       ['Nevada', -257, 56],
       ['New Hampshire', 302, 127],
       ['New Jersey', 282, 65],
       ['New Mexico', -128, -43],
       ['New York', 236, 104],
       ['North Caro

You can import data from text files as well, mention seperator correctly

In [43]:
df = pd.read_table("E:/CODING/GitHub Repositories/PandasCheatSheet/50_states.csv", sep = ",")
df.head()

Unnamed: 0,state,x,y
0,Alabama,139,-77
1,Alaska,-204,-170
2,Arizona,-203,-40
3,Arkansas,57,-53
4,California,-297,13


You can directly read file from the internet as well

In [45]:
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/ToothGrowth.csv")
df.head()

Unnamed: 0,len,supp,dose
0,4.2,VC,0.5
1,11.5,VC,0.5
2,7.3,VC,0.5
3,5.8,VC,0.5
4,6.4,VC,0.5


From your clipboard as well

In [None]:
df = pd.read_clipboard(sep = "\t") #copy from excel and enter the sep as required
df.head()

Besides this, pandas also supports reading files in a viriety of file formats such as pickle, fwf(fixed with format), Excel, JSON, HTML, Tables, HDF Store, Feather, Parquet, ORC, SAS, SPSS, Stata, Sql Queries and Google Big Query