# Pandas

Panadas is a Python Module for numerical analysis and Time series analaysis

## Basic Data Structures

In [None]:
import math
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

#example dataframe from the bokeh module
from bokeh.sampledata.autompg import autompg as df
from scipy.stats import linregress

### Series
**Series** can only contain **single** list with index, whereas dataframe can be made of more than one series or we can say that a **dataframe is a collection of series** that can be used to analyse the data.

In [None]:
my_series = pd.Series([
4.6, 2.1, -4.0, 3.0])

print(my_series)

0    4.6
1    2.1
2   -4.0
3    3.0
dtype: float64


Can also just print the values

In [None]:
print(my_series.values)

### DataFrame

Creating an empty dataframe

In [None]:
df1 = pd.DataFrame()

You can also create a empty DataFrame with define columns

In [None]:
df1 = pd.DataFrame(columns=('Col 1', 'Col 2', 'Col3'))
df1.head()


Unnamed: 0,Col 1,Col 2,Col3


In [None]:
a = pd.DataFrame([[1,2,3],[3,4,5]], columns=list('ABC'))
b = pd.DataFrame([[5,2,3],[7,4,5]], columns=list('BDE'))
c = pd.DataFrame([[11,12,13],[17,14,15]], columns=list('XYZ'))
a.head(2)
b.tail()


Unnamed: 0,B,D,E
0,5,2,3
1,7,4,5


## Writing To and Reading From Files

CSV

Let's write of auto mpg data frame to csv.  Note: If you leave the index parameter set to True, you'll get an extra column called 'Unnamed:0' in your CSV

In [None]:
df.to_csv("autompg.csv", index=False)

Here is how we read a CSV

In [None]:
df1 = pd.read_csv("autompg.csv")
df1.head()

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


   You can also specify which columns you'd like to read in...if you want a subset of the set

In [None]:
df2 = pd.read_csv("autompg.csv", usecols=['name', 'mpg'])
df2.head()

Unnamed: 0,mpg,name
0,18.0,chevrolet chevelle malibu
1,15.0,buick skylark 320
2,18.0,plymouth satellite
3,16.0,amc rebel sst
4,17.0,ford torino


You can also use different delimters

In [None]:
df1.to_csv("autompg.tsv", index=False, sep="\t")
df1 = pd.read_csv("autompg.tsv", sep="\t")
df1.head()

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


### Handling Large Data Files

You will often be working with files that are 1GB+ and reading them all in one step is difficult.  Chunking is an approach that might solve that problem.

In [None]:
large_df = pd.DataFrame()

for i in range(1000):
    large_df = large_df.append(df1, ignore_index=True)

large_df.to_csv("large.csv", index=False)

for chunk in pd.read_csv('large.csv', chunksize=100):
    print(chunk.head(1))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
         mpg  cyl  displ  hp  weight  accel  yr  origin            name
198700  29.9    4   98.0  65    2380   20.7  81       1  ford escort 2h
         mpg  cyl  displ  hp  weight  accel  yr  origin                   name
198800  24.0    4  113.0  95    2278   15.5  72       3  toyota corona hardtop
         mpg  cyl  displ   hp  weight  accel  yr  origin                 name
198900  16.0    8  318.0  150    4498   14.5  75       1  plymouth grand fury
         mpg  cyl  displ   hp  weight  accel  yr  origin  \
199000  20.6    6  231.0  105    3380   15.8  78       1   

                         name  
199000  buick century special  
         mpg  cyl  displ   hp  weight  accel  yr  origin             name
199100  25.4    6  168.0  116    2900   12.6  81       3  toyota cressida
         mpg  cyl  displ   hp  weight  accel  yr  origin              name
199200  14.0    8  351.0  153    4129   13.0  72       1  ford galaxi

## Working with Excel

Use ExcelWriter to write a DataFrame or multiple DataFrames to an Excel Workbook

In [None]:
df2 = pd.DataFrame([{'Name':'Steve Jobs', 'Company':'Apple'}, {'Name':'Bill Gates', 'Company':'Microsoft'}])

# Initialize the workbook
writer = pd.ExcelWriter('test_workbook.xlsx')

#Write DataFrames to Excel sheets
df1.to_excel(writer, "Sheet1")
df2.to_excel(writer, "Sheet2")
writer.save()

When reading from an Excel workbook, Pandas assumes you want just the first sheet of the workbook by default


In [None]:
df1 = pd.read_excel('test_workbook.xlsx')
df1.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Col 1,Col 2,Col3


To read a specific sheet you simply use the input variable sheet_name

In [None]:
df1 = pd.read_excel('test_workbook.xlsx', sheet_name='Sheet2')
df1.head()

Requests is a Python Library that lets you send HTTP/1.1 requests, add headers, form data, multipart files, and parameters with simple Python dictionaries.

In [None]:
import pandas as pd
df = pd.DataFrame({'col1':[2,3,4], 'col2':[3,4,5], 'col3':[6,7,8]})
df

Unnamed: 0,col1,col2,col3
0,2,3,6
1,3,4,7
2,4,5,8
