# Python - Pandas

- Pandas is an open source python library.
- It provides highly efficient data structures and data analysis tools for python programming language.
- Python with pandas is used in a variety of domains like academics, finance, economics, statistics
- It deals with the following three data structures
    1. Series
    2. Data frame
    3. Panel
- These data structures are built on top of numpy array, thus making them fast and efficient.

# Dimension and Description

- Higher dimensional data structure is a container of lower dimensional data structure.
- Data frame is a container of series and panel is a container of data frame
- Series is a one-dimensional collection of similar elements. Series is nothing but a collection of integers.
- Points to Consider
    1. Collection of similar elements.
    2. Size cannot be changed(i.e, it is immutable)
    3. Values of the data can be changed(i.e, it is mutable)

### Examples 

In [1]:
# import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print(s)

Series([], dtype: float64)


  This is separate from the ipykernel package so we can avoid doing imports until


In [2]:
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print(s)

0    a
1    b
2    c
3    d
dtype: object


In [3]:
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data, index=[10,11,12,13])
print(s)

10    a
11    b
12    c
13    d
dtype: object


# Create a series from dictionary

- A dictionary can be passed as an input.
- If no index is specified, then dictionary keys are taken in a sorted order to create an index.
- If index is passed, then values in the data corresponding to the labels in the index will be pulled out.

### Examples

In [10]:
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a':0.,'b':1.,'c':2,'d':3.}
s = pd.Series(data)
print(s)

s1 = pd.Series(data, index=['c','a','d','b'])
print(s1)

a    0.0
b    1.0
c    2.0
d    3.0
dtype: float64
c    2.0
a    0.0
d    3.0
b    1.0
dtype: float64


# Create a series from a Scalar

- If the data is a scalar value, then an index must be provided.
- Value will be repeated to match the length of the index.

In [1]:
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0,1,2,3])
print(s)

0    5
1    5
2    5
3    5
dtype: int64


# Accessing data from series with position

- Data in a series can be accessed similar to that in an n dimensional array.
- Example: Retrieve the first element
- Counting starts from 0 in the array.
- It means that the first element is stored at the 0th position and so on.

In [7]:
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
s = pd.Series([1,2,3,4,5], index=['a','b','c','d','e']) 
print(s[0])  # retrieve the first element

1


In [8]:
print(s[:3]) # retrieve the first three element

a    1
b    2
c    3
dtype: int64


In [9]:
print(s[-3:]) # retrieve the last three element

c    3
d    4
e    5
dtype: int64


# Retrieve the data using label(Index)

- A series is like a fixed-size dict.
- In a dictionary, we can get and set values by index label.
- Example: Retrieve a single element using index label value.
- Retrieve multiple elements using a list of label index values.

In [10]:
print(s['a']) # retrieve a single element

1


In [12]:
print(s[['a','b','c','d']]) # retrieve multiple elements

a    1
b    2
c    3
d    4
dtype: int64


# Data Frame

- It is 2 dimensional array with different data elements(i.e, it is a heterogeneous collection of data elements)
- Data is stored in tabular format in the form of rows and columns.
- for eg. consider the following dataframe

####   Name     Dept    Sem    Percentage
1.  Sam       ECE    I     78
2.  Geetha    CSE    II    85
3.  Kala      ECE    III   75
4.  Mala      CSE    IV    70

# Features of Dataframe

- Columns are of different types.
- The size of the dataframe can be changed(i.e size - mutable)
- Labeled axes(rows and columns)
- Various arithmetic operations can be performed on rows and columns.

### Pandas.DataFrame
- A pandas dataframe can be created using the following constructor.
- pandas.DataFrame(data,index,columns,dtype,copy)

### DataFrame Creation
- A pandas dataframe can be created by using various inputs like
    1. Lists
    2. Dict
    3. Series
    4. Numpy arrays
    5. Another DataFrame

### Data types of columns
- Name - String
- Dept - String
- Semester - String
- Percentage - Integer

- Heterogeneous collection of data(Collection of different data elements)
- Size can be changed(Size mutable)
- Data can be changed(Data mutable)

# DataFrame Creation

- A basic dataframe can be created and it is called an empty Dataframe.
- Pandas dataframe consists of 3 principal components namely the data, rows and columns.
- Dataframe can be created from a list, dictionary and from a list of dictionary.

In [14]:
# import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame() # empty dataframe
print(df)

Empty DataFrame
Columns: []
Index: []


In [15]:
lst = ['sun','earth','mars','venus','jupiter','moon','saturn']
df = pd.DataFrame(lst)
print(df)

         0
0      sun
1    earth
2     mars
3    venus
4  jupiter
5     moon
6   saturn


In [16]:
data = [['Rose',10],['Bob',12],['Sunny',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)

    Name  Age
0   Rose   10
1    Bob   12
2  Sunny   13


In [17]:
data = {'Name':['Tom','Sam','Vicky','Steve'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print(df)

    Name  Age
0    Tom   28
1    Sam   34
2  Vicky   29
3  Steve   42


### Create a dataframe from a list of dicts
List of dictionaries can be passed as input to create a data frame.

In [1]:
import pandas as pd
data = [{'a':3,'b':5},{'a':5,'b':10,'c':20}]
df = pd.DataFrame(data)
print(df)

   a   b     c
0  3   5   NaN
1  5  10  20.0


### Create a data frame by passing a list of dictionaries and row indices 

In [2]:
df = pd.DataFrame(data, index=['first','second'])
print(df)

        a   b     c
first   3   5   NaN
second  5  10  20.0


In [4]:
# create a dataframe with a list of dictionaries, row indices and column indices
import pandas as pd
data = [{'a':1,'b':2},{'a':5,'b':10,'c':20}]
df1 = pd.DataFrame(data, index=['first','second'], columns=['a','b'])  # with 2 column indices, values same as dictionary keys
df2 = pd.DataFrame(data, index=['first','second'], columns=['a','b1']) # with 2 column indices with one index with other name
print(df1)
print(df2)

        a   b
first   1   2
second  5  10
        a  b1
first   1 NaN
second  5 NaN


### Column Selection
We can understand this by selecting a column from the database

In [5]:
import pandas as pd
d = {'one':pd.Series([1,2,3], index=['a','b','c']), 'two':pd.Series([1,2,3,4], index=['a','b','c','d'])}
df = pd.DataFrame(d)
print(df['one'])

a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64


### Column Addition 
We can understand this concept by adding a new column to the dataframe

In [9]:
import pandas as pd
d = {'one':pd.Series([1,2,3], index=['a','b','c']), 'two':pd.Series([1,2,3,4], index=['a','b','c','d'])}
df = pd.DataFrame(d)

# adding a new column to an existing DataFrame object with column label by passing new series
print("Adding a new column by passing as Series")
df['three'] = pd.Series([10,20,30], index=['a','b','c'])
print(df)

print("Adding a new column using the existing columns in DataFrame")
df['four'] = df['one'] + df['three']
print(df)

Adding a new column by passing as Series
   one  two  three
a  1.0    1   10.0
b  2.0    2   20.0
c  3.0    3   30.0
d  NaN    4    NaN
Adding a new column using the existing columns in DataFrame
   one  two  three  four
a  1.0    1   10.0  11.0
b  2.0    2   20.0  22.0
c  3.0    3   30.0  33.0
d  NaN    4    NaN   NaN
