<a href="https://colab.research.google.com/github/Drishti05/Python-for-Data-Science-Machine-Learning-from-A-Z/blob/main/Pandas/Pandas_Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas Data Structures

In [None]:
import pandas as pd

## Series

In [None]:
s= pd.Series([3,-5,7,4],index=['a','b','c','d'])

In [None]:
s

a    3
b   -5
c    7
d    4
dtype: int64

## Dataframe

In [None]:
data={'Country':['Belgium','India','Brazil'],'Capital':['Brussels','New Delhi','Brasilia'],'Population':[11190846,1303171035,207847528]}

In [None]:
df=pd.DataFrame(data,columns=['Country','Capital','Population'])

In [None]:
df

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,11190846
1,India,New Delhi,1303171035
2,Brazil,Brasilia,207847528


## Asking For Help

In [None]:
help(pd.Series.loc)

Help on property:

    Access a group of rows and columns by label(s) or a boolean array.
    
    ``.loc[]`` is primarily label based, but may also be used with a
    boolean array.
    
    Allowed inputs are:
    
    - A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is
      interpreted as a *label* of the index, and **never** as an
      integer position along the index).
    - A list or array of labels, e.g. ``['a', 'b', 'c']``.
    - A slice object with labels, e.g. ``'a':'f'``.
    
          start and the stop are included
    
    - A boolean array of the same length as the axis being sliced,
      e.g. ``[True, False, True]``.
    - A ``callable`` function with one argument (the calling Series or
      DataFrame) and that returns valid output for indexing (one of the above)
    
    See more at :ref:`Selection by Label <indexing.label>`
    
    Raises
    ------
    KeyError
        If any items are not found.
    
    See Also
    --------
    DataFrame.at : Access

## Basic Information

In [30]:
df.shape

(3, 3)

In [31]:
df.index

RangeIndex(start=0, stop=3, step=1)

In [32]:
df.columns

Index(['Country', 'Capital', 'Population'], dtype='object')

In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Country     3 non-null      object
 1   Capital     3 non-null      object
 2   Population  3 non-null      int64 
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes


In [34]:
df.count()

Country       3
Capital       3
Population    3
dtype: int64

## Summary

In [49]:
df.sum()

Country              BelgiumIndiaBrazil
Capital       BrusselsNew DelhiBrasilia
Population                   1522209409
dtype: object

In [50]:
df.cumsum()

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,11190846
1,BelgiumIndia,BrusselsNew Delhi,1314361881
2,BelgiumIndiaBrazil,BrusselsNew DelhiBrasilia,1522209409


In [37]:
df.min()

Country        Belgium
Capital       Brasilia
Population    11190846
dtype: object

In [38]:
df.max()

Country            India
Capital        New Delhi
Population    1303171035
dtype: object

In [47]:
df['Population'].idxmin()

0

In [48]:
df['Population'].idxmax()

1

In [41]:
df.describe()

Unnamed: 0,Population
count,3.0
mean,507403100.0
std,696134600.0
min,11190850.0
25%,109519200.0
50%,207847500.0
75%,755509300.0
max,1303171000.0


In [42]:
df.mean()

Population    5.074031e+08
dtype: float64

In [43]:
df.median()

Population    207847528.0
dtype: float64

## Selection

In [None]:
s['b']

-5

In [None]:
df[1:]

Unnamed: 0,Country,Capital,Population
1,India,New Delhi,1303171035
2,Brazil,Brasilia,207847528


## By Position

In [None]:
df.iloc[[0],[0]]

Unnamed: 0,Country
0,Belgium


In [None]:
df.iat[0,0]

'Belgium'

## By Label

In [None]:
df.loc[[0],['Country']]

Unnamed: 0,Country
0,Belgium


In [None]:
df.iloc[[0],['Country']]

IndexError: ignored

In [None]:
df.at[0 ,'Country']

'Belgium'

## By Label/ Position

### ix[] has been deprecated from Pandas version 0.20. 0

## Boolean Indexing

In [None]:
s[-(s>1)]

b   -5
dtype: int64

In [26]:
s[(s<-1)|(s>2)]

a    3
b   -5
c    7
d    4
dtype: int64

In [27]:
df[df['Population']>1200000000]

Unnamed: 0,Country,Capital,Population
1,India,New Delhi,1303171035


## Setting

In [28]:
s['a']=6

In [29]:
s

a    6
b   -5
c    7
d    4
dtype: int64

## Apply Functions

In [51]:
f= lambda x:x*2
df.apply(f)

Unnamed: 0,Country,Capital,Population
0,BelgiumBelgium,BrusselsBrussels,22381692
1,IndiaIndia,New DelhiNew Delhi,2606342070
2,BrazilBrazil,BrasiliaBrasilia,415695056


In [52]:
df.applymap(f)

Unnamed: 0,Country,Capital,Population
0,BelgiumBelgium,BrusselsBrussels,22381692
1,IndiaIndia,New DelhiNew Delhi,2606342070
2,BrazilBrazil,BrasiliaBrasilia,415695056


## Internal Data Alignment

In [53]:
s3= pd.Series([7,-2,3] , index=['a','c','d'])
s+s3

a    13.0
b     NaN
c     5.0
d     7.0
dtype: float64

## Summary

In [54]:
s.add(s3,fill_value=0)

a    13.0
b    -5.0
c     5.0
d     7.0
dtype: float64

In [55]:
s.sub(s3,fill_value=2)

a   -1.0
b   -7.0
c    9.0
d    1.0
dtype: float64

In [56]:
s.div(s3,fill_value=4)

a    0.857143
b   -1.250000
c   -3.500000
d    1.333333
dtype: float64

In [57]:
s.mul(s3,fill_value=3)

a    42.0
b   -15.0
c   -14.0
d    12.0
dtype: float64

## Sort And Rank

In [58]:
df.sort_index()

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,11190846
1,India,New Delhi,1303171035
2,Brazil,Brasilia,207847528


In [59]:
df.sort_values(by='Country')

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,11190846
2,Brazil,Brasilia,207847528
1,India,New Delhi,1303171035


In [60]:
df.rank()

Unnamed: 0,Country,Capital,Population
0,1.0,2.0,1.0
1,3.0,3.0,3.0
2,2.0,1.0,2.0


## Dropping

In [61]:
s.drop(['a','c'])

b   -5
d    4
dtype: int64

In [62]:
df.drop('Country',axis=1)

Unnamed: 0,Capital,Population
0,Brussels,11190846
1,New Delhi,1303171035
2,Brasilia,207847528
