## Pandas

### What is Pandas?

- Pandas is used for data manipulation, analysis and cleaning. Python pandas is well suited for different kinds of data, such as: 

* Pandas is an opensource library that allows to you perform data manipulation in Python. 

- Tabular data with heterogeneously-typed columns
- Ordered and unordered time series data
- Arbitrary matrix data with row & column labels
- Unlabelled data
- Any other form of observational or statistical data sets

### Why use Pandas?

- Easily handles missing data
- It uses Series for one-dimensional data structure and DataFrame for multi-dimensional data structure
- It provides an efficient way to slice the data
- It provides a flexible way to merge, concatenate or reshape the data
- It includes a powerful time series tool to work with


#### Pandas Operation
   - Slicing the data frame
    - Merging & Joining
    - Concatenation
    - Changing the index
    - Change Column headers
    - Data munging
    - Use-Case: Analyze youth unemployment data
    
#### How to install Pandas?
    To install Python Pandas, go to your command line/ terminal and type “pip install pandas” or else, if you have anaconda installed in your system, just type in “conda install pandas”. Once the installation is completed, go to your IDE (Jupyter, PyCharm etc.) and simply import it by typing: “import pandas as pd”
    
#### Refrence: 
1. https://pandas.pydata.org/
2. https://www.edureka.co/blog/python-pandas-tutorial/

In [10]:
import pandas as pd
r=pd.Series([1,3,5])
print(r)

0    1
1    3
2    5
dtype: int64


In [13]:
s=pd.Series(["a","b","c"])
print(s)

0    a
1    b
2    c
dtype: object


### Slicing the Data Frame

In [23]:
print(s)
print(s[1])
print(s[:])
print(s[1:])
print(s[::-1])

0    a
1    b
2    c
dtype: object
b
0    a
1    b
2    c
dtype: object
1    b
2    c
dtype: object
2    c
1    b
0    a
dtype: object


In [33]:
s=['Categorical',
 'CategoricalDtype',
 'CategoricalIndex',
 'DataFrame',
 'DateOffset',
 'DatetimeIndex',
 'DatetimeTZDtype',
 'ExcelFile',
 'ExcelWriter',
 'Float64Index',
 'Grouper',
 'HDFStore',
 'Index',
 'IndexSlice',
 'Int16Dtype',
 'Int32Dtype',
 'Int64Dtype',
 'Int64Index',
 'Int8Dtype',
 'Interval',
 'IntervalDtype',
 'IntervalIndex',
 'MultiIndex',
 'NaT',
 'Panel',
 'Period',
 'PeriodDtype',
 'PeriodIndex',
 'RangeIndex',
 'Series',
 'SparseArray',
 'SparseDataFrame',
 'SparseDtype',
 'SparseSeries',
 'TimeGrouper',
 'Timedelta',
 'TimedeltaIndex',
 'Timestamp',
 'UInt16Dtype',
 'UInt32Dtype',
 'UInt64Dtype',
 'UInt64Index',
 'UInt8Dtype',
 'api',
 'array',
 'arrays',
 'bdate_range',
 'compat',
 'concat',
 'core',
 'crosstab',
 'cut',
 'date_range',
 'datetime',
 'describe_option',
 'errors',
 'eval',
 'factorize',
 'get_dummies',
 'get_option',
 'infer_freq',
 'interval_range',
 'io',
 'isna',
 'isnull',
 'lreshape',
 'melt',
 'merge',
 'merge_asof',
 'merge_ordered',
 'notna',
 'notnull',
 'np',
 'offsets',
 'option_context',
 'options',
 'pandas',
 'period_range',
 'pivot',
 'pivot_table',
 'plotting',
 'qcut',
 'read_clipboard',
 'read_csv',
 'read_excel',
 'read_feather',
 'read_fwf',
 'read_gbq',
 'read_hdf',
 'read_html',
 'read_json',
 'read_msgpack',
 'read_parquet',
 'read_pickle',
 'read_sas',
 'read_sql',
 'read_sql_query',
 'read_sql_table',
 'read_stata',
 'read_table',
 'reset_option',
 'set_eng_float_format',
 'set_option',
 'show_versions',
 'test',
 'testing',
 'timedelta_range',
 'to_datetime',
 'to_msgpack',
 'to_numeric',
 'to_pickle',
 'to_timedelta',
 'tseries',
 'unique',
 'util',
 'value_counts',
 'wide_to_long']


In [39]:
m=pd.Series([2,4,3],index=["a","b","c"])
# or
m = pd.Series([3,5,8],['a','b','c'])
m

a    3
b    5
c    8
dtype: int64

In [43]:
s=pd.Series([34,56,33,44],index=pd.date_range("23-9-1993","26-9-1993"))
s

1993-09-23    34
1993-09-24    56
1993-09-25    33
1993-09-26    44
Freq: D, dtype: int64

In [45]:
s={'Raja':[25,9052507933,'Love',4],'Maha':[22,23344444222,'love',3]}
r=pd.DataFrame(s)
r

Unnamed: 0,Raja,Maha
0,25,22
1,9052507933,23344444222
2,Love,love
3,4,3


In [47]:
print(r.head(1))
print(r.tail(1))

  Raja Maha
0   25   22
  Raja Maha
3    4    3


In [65]:
r=pd.DataFrame({'raj':[1,2,3,4,6,7],'maha':[4,6,4,5,6,7]},index=[2001,2002,2003,2004,2005,2006])
m=pd.DataFrame({'raj':[1,2,9,4,0,7],'maha':[4,3,9,5,6,1]},index=[2007,2008,2009,2010,2011,2012])
mer = pd.merge(r,m)
mer

Unnamed: 0,raj,maha
0,1,4
1,4,5


In [66]:
r

Unnamed: 0,raj,maha
2001,1,4
2002,2,6
2003,3,4
2004,4,5
2005,6,6
2006,7,7


In [67]:
m

Unnamed: 0,raj,maha
2007,1,4
2008,2,3
2009,9,9
2010,4,5
2011,0,6
2012,7,1


In [68]:
l=pd.merge(r,m,on='raj')
l

Unnamed: 0,raj,maha_x,maha_y
0,1,4,4
1,2,6,3
2,4,5,5
3,7,7,1


In [80]:
r=pd.DataFrame({'raj':[1,2,3,4,6,7,4],'maha':[4,6,4,5,6,7,5]},index=[2001,2002,2003,2004,2005,2006,2007])
m=pd.DataFrame({'raj1':[8,2,9,4,0,7,6],'maha1':[4,3,9,5,6,1,6]},index=[2001,2002,2003,2004,2005,2006,2008])

jo=r.join(m)
jo

Unnamed: 0,raj,maha,raj1,maha1
2001,1,4,8.0,4.0
2002,2,6,2.0,3.0
2003,3,4,9.0,9.0
2004,4,5,4.0,5.0
2005,6,6,0.0,6.0
2006,7,7,7.0,1.0
2007,4,5,,


In [86]:
r=pd.DataFrame({'raj':[1,2,3,4,6,7,3],'maha':[4,6,4,5,6,7,4]},index=[2001,2002,2003,2004,2005,2006,2013])
m=pd.DataFrame({'raj':[1,2,9,4,0,7],'maha':[4,3,9,5,6,1]},index=[2007,2008,2009,2010,2011,2012])
c=pd.concat([m,r])
c

Unnamed: 0,raj,maha
2007,1,4
2008,2,3
2009,9,9
2010,4,5
2011,0,6
2012,7,1
2001,1,4
2002,2,6
2003,3,4
2004,4,5
