# Pandas for Data Wrangling

will explain below each topics separately with examples.

1. Introduction to Data Wrangling & Pandas
2. Series & DataFrames
3. Loading data into DataFrames
4. Indexing and selecting data
5. Working on TimeSeries Data
6. Merge, join, and concatenate dataframes
7. Reshaping and pivot tables
8. Working with text data
9. Working with missing data
10. Styling Pandas Table
11. Computational tools
12. Group By: split-apply-combine
13. Options and settings
14. Enhancing performance

<hr>

## 1. Introduction to Data Wrangling & Pandas
<hr>

### Data Wrangling
* Getting & Reading data from different sources.
* Cleaning Data
* Shaping & Structuring Data
* Storing Data

There are many tools & libraries available for data wrangling. Tools like rapidminer & libraries like pandas. Organizations find libraries more suited because of flexibility.

### Pandas
* High Performance, Easy-to-use open source library for Data Analysis
* Creates tabular format of data from different sources like csv, json, database.
* Have utilities for descriptive statistics, aggregation, handling missing data
* Database utilities like merge, join are available
* Fast, Programmable & Easy alternative to spreadsheets

## 2. Series & DataFrames

In [1]:
import pandas as pd
import numpy as np

In [2]:
#to get the version
pd.__version__

'0.25.1'

#### Series

* A Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). It has to be remembered that unlike Python lists, a Series will always contain data of the same type. 
* Series datastructure represents a column.
* Each columns will a data type
* Combine multiple columns to create a table ( .i.e DataFrame )

In [3]:
#Series takes 'data' & 'index' as input

#Note:- Iam Giving DataFrame name as my name for every time i create.

In [4]:
jagadeesh_ser1 = pd.Series(data=[1,2,3,4,5], index=list('abcde'))
jagadeesh_ser1

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [5]:
#To cheack the data type
jagadeesh_ser1.dtype

dtype('int64')

In [6]:
jagadeesh_ser2 = pd.Series(data=[11,22,33,44,55], index=list('abcde'))
jagadeesh_ser2

a    11
b    22
c    33
d    44
e    55
dtype: int64

#### DataFrame
* DataFrame is tabular representation of data.
* Combine multiple series to create a dataframe
* Data corresponding to same index belongs to same row

In [7]:
#Creating a DataFrame with dict
jagadeesh_df = pd.DataFrame({'A':jagadeesh_ser1, 'B':jagadeesh_ser2})

In [8]:
jagadeesh_df

Unnamed: 0,A,B
a,1,11
b,2,22
c,3,33
d,4,44
e,5,55


In [9]:
#Creating DataFrame with random numbers from 1 to 10, for this we need numpy library
jagadeesh_df2 = pd.DataFrame(data=np.random.randint(1,10,size=(10,10)), 
             index=list('ABCDEFGHIJ'), 
             columns=list('abcdefghij'))
jagadeesh_df2

Unnamed: 0,a,b,c,d,e,f,g,h,i,j
A,5,1,7,1,7,5,4,7,6,6
B,9,8,1,2,7,6,3,8,2,3
C,2,5,1,9,5,1,4,3,9,8
D,1,5,5,5,8,5,1,9,1,1
E,1,2,4,1,4,9,8,5,2,9
F,5,1,2,6,9,9,1,2,6,9
G,5,6,4,3,1,8,3,6,1,4
H,5,6,9,8,1,9,6,4,5,6
I,9,2,9,7,7,2,4,8,6,1
J,8,8,3,1,7,5,8,8,4,4
