# An Introduction to Pandas

Pandas is an open source library providing high-performance, easy-to-use data structures 
and data analysis tools for the Python programming language.


# Installation instructions for pandas:

For installing Pandas we will be using the anaconda navigator.

Installation instructions for anaconda:

1) For Windows go the following link: 
http://docs.continuum.io/anaconda/install/windows/

2)For Linux go the following link:
http://docs.continuum.io/anaconda/install/linux/

Now run the installer to have access to pandas and the rest of the SciPy stack without
needing to install anything else.

In [4]:
#Importing the pandas library
import pandas as pd
import numpy as np

# DataFrames and Series

Series and Dataframes are similar to arrays in Numpy.
Like a NumPy array, a pandas Series has a dtype . This is often a NumPy dtype.

In [5]:
# Creating a Series by passing a list of values, letting pandas create a default integer index:
  
s = pd.Series([1, 3, 5, np.nan, 6, 8]);
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

In [13]:
#  Creating a Dataframe 
#     Ex1:
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
print(df)
print()
# Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:

#     Ex2:
dates = pd.date_range('20130101', periods=6)
print(dates)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print()
print(list('ABCD'))
df

   col1  col2
0     1     3
1     2     4

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

['A', 'B', 'C', 'D']


Unnamed: 0,A,B,C,D
2013-01-01,0.904632,-0.059137,-0.567054,-0.912706
2013-01-02,-0.805827,0.800359,0.698279,-2.131705
2013-01-03,0.725991,1.360938,0.498329,1.243244
2013-01-04,-0.922469,0.376714,0.538322,0.240073
2013-01-05,0.504744,-0.655517,0.498344,-0.870754
2013-01-06,1.532796,-0.063732,-0.027163,-0.755842


In [None]:
From the previous commands we get the following output Data Frame 
    
                A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401
2013-01-06 -0.673690  0.113648 -1.478427  0.524988

# Operations on dtypes

In [14]:
# arithmetic
print(s+1)
print(s+.01)

0    2.0
1    4.0
2    6.0
3    NaN
4    7.0
5    9.0
dtype: float64
0    1.01
1    3.01
2    5.01
3     NaN
4    6.01
5    8.01
dtype: float64


In [15]:
# comparison
s==1

0     True
1    False
2    False
3    False
4    False
5    False
dtype: bool

In [16]:
# indexing
s.iloc[1:3]

1    3.0
2    5.0
dtype: float64

In [18]:
# Operate with other dtypes
s + s.iloc[1:3].astype('Int8')

0     NaN
1     6.0
2    10.0
3     NaN
4     NaN
5     NaN
dtype: float64

In [19]:
# To get the types of the elements:
df.dtypes

A    float64
B    float64
C    float64
D    float64
dtype: object

Dtypes can be merged,reshaped or casted.

In [23]:
print(pd.concat([df[['A']], df[['B', 'C']]], axis=1))
pd.concat([df[['A']], df[['B', 'C']]], axis=1).dtypes

                   A         B         C
2013-01-01  0.904632 -0.059137 -0.567054
2013-01-02 -0.805827  0.800359  0.698279
2013-01-03  0.725991  1.360938  0.498329
2013-01-04 -0.922469  0.376714  0.538322
2013-01-05  0.504744 -0.655517  0.498344
2013-01-06  1.532796 -0.063732 -0.027163


A    float64
B    float64
C    float64
dtype: object

In [26]:
# Changing the type
df['A'].astype(float)

2013-01-01    0.904632
2013-01-02   -0.805827
2013-01-03    0.725991
2013-01-04   -0.922469
2013-01-05    0.504744
2013-01-06    1.532796
Freq: D, Name: A, dtype: float64

In [28]:
# Reductions and groupby operations.

# Sum operation
print(df.sum())

# Grouping by operation
df.groupby('B').A.sum()


A    1.939868
B    1.759626
C    1.639058
D   -3.187690
dtype: float64


B
-0.655517    0.504744
-0.063732    1.532796
-0.059137    0.904632
 0.376714   -0.922469
 0.800359   -0.805827
 1.360938    0.725991
Name: A, dtype: float64