## Pandas
Pandas is an open-source Python library providing high-performance, easy-to-use data structures and data analysis tools.

Features:
    * Fast and efficient DataFrame object with default and customized indexing.
    * Tools for loading data into in-memory data objects from different file formats.
    * Data alignment and integrated handling of missing data.
    * Reshaping and pivoting of date sets.
    * Label-based slicing, indexing and subsetting of large data sets.
    * Columns from a data structure can be deleted or inserted.
    * Group by data for aggregation and transformations.
    * High performance merging and joining of data.
    * Time Series functionality.


In [1]:
import pandas as pd

# read data from csv file
df = pd.read_csv('nyc_weather.csv')
# df.to_csv('file_name.csv', index=False)  # saves df to a csv file

# top few rows. (can use 'tail()' for last few rows)
print(df.head())
# column names
print('\ncolumns:\n', df.columns)
# maximum temperature
print('\nmaximum temp:', df['maximum temperature'].max())
# dates which had maximum temperature
print(df['date'][df['maximum temperature'] == df['maximum temperature'].max()])
# avg temperature description
print('\navg temp desription:\n', df['average temperature'].describe())

# note: df['date'] is same as df.date

       date  maximum temperature  minimum temperature  average temperature  \
0  1-1-2016                   42                   34                 38.0   
1  2-1-2016                   40                   32                 36.0   
2  3-1-2016                   45                   35                 40.0   
3  4-1-2016                   36                   14                 25.0   
4  5-1-2016                   29                   11                 20.0   

  precipitation snow fall snow depth  
0          0.00       0.0          0  
1          0.00       0.0          0  
2          0.00       0.0          0  
3          0.00       0.0          0  
4          0.00       0.0          0  

columns:
 Index(['date', 'maximum temperature', 'minimum temperature',
       'average temperature', 'precipitation', 'snow fall', 'snow depth'],
      dtype='object')

maximum temp: 96
204    23-7-2016
225    13-8-2016
Name: date, dtype: object

avg temp desription:
 count    366.000000
mean   

In [2]:
# loading a dataframe from lists or tuples (we can load the data from dict also)
data = [['tom', 65, 180],
        ['harry', 78, 176],
        ['john', 61, 170]]

df_list = pd.DataFrame(data, columns=['name', 'weight', 'height'])
print('dataframe:\n', df_list)
print('\nshape:', df_list.shape)
print('\nnames:\n', df_list['name'])

dataframe:
     name  weight  height
0    tom      65     180
1  harry      78     176
2   john      61     170

shape: (3, 3)

names:
 0      tom
1    harry
2     john
Name: name, dtype: object


In [3]:
# groupby - groups df as per value of given parameter

# group = df.groupby('city')
# group.max()   # for each group return max for each column

In [4]:
# concatenate dataframes

# df = pd.concat([df1, df2])

In [5]:
# merge - same as join in sql

# df = pd.merge(df1, df2, on='column_to_do_merge_on')
# df = pd.merge(df1, df2, on='column_to_do_merge_on', how='outer')  # same as outer join (doesn't skip rows)

In [6]:
# loc and iloc

# df.loc[i]    # returns values at index i
# df.iloc[i]   # returns values at row i

# by default 'index == row number'