<h1 style="color:yellow;background-color:green;font-size:250%;border:2px solid black;text-align:center;"> Intro to Pandas</h1>

When working with tabular data, such as data stored in spreadsheets or databases, pandas is the right tool for you. pandas will help you to explore, clean, and process your data. In pandas, a data table is called a DataFrame

<h1 style = "font-size:200%;color:
DodgerBlue;"> Load Pandas</h1>


To load the pandas package and start working with it, import the package. The community agreed alias for pandas is pd, so loading pandas as pd is assumed standard practice for all of the pandas documentation.

In [23]:
import pandas as pd

<h1><span style="color:#FF0000">Creating a</span>
<span style="color:#66CC66">DataFrame.</span></h1>

In [24]:
# Mnemonic name for DataFrame df
df = pd.DataFrame({
    'Name':['Satyam','Vikas','Mayuree','Eshika','Aakas'],
    'Age':[i for i in range(30,80,10)],
    'Experience':[i for i in range(5,18,3)]
})
df

Unnamed: 0,Name,Age,Experience
0,Satyam,30,5
1,Vikas,40,8
2,Mayuree,50,11
3,Eshika,60,14
4,Aakas,70,17


Notice that the inferred dtype is int64 & object.

In [25]:
df.dtypes  

Name          object
Age            int64
Experience     int64
dtype: object

To enforce a single dtype:

In [26]:
import numpy as np
df = pd.DataFrame({
    'Name':['Satyam','Vikas','Mayuree','Eshika','Aakas'],
    'Age':[i for i in range(30,80,10)],
    'Experience':[i for i in range(5,18,3)]},dtype=np.str0)
df.dtypes

Name          object
Age           object
Experience    object
dtype: object

Constructing DataFrame from a dictionary including Series:

In [27]:
df = pd.DataFrame({
    'Name':['Satyam','Vikas','Mayuree','Eshika','Aakas'],
    'Age':[i for i in range(30,80,10)],
    'Experience':pd.Series([i for i in range(5,18,3)])
})
df

Unnamed: 0,Name,Age,Experience
0,Satyam,30,5
1,Vikas,40,8
2,Mayuree,50,11
3,Eshika,60,14
4,Aakas,70,17


Constructing DataFrame from numpy ndarray:

In [28]:
# Genrate Random column names
from random_word import RandomWords
r = RandomWords()

labels=[r.get_random_word() for i in range(10)]
arr=np.random.randint(1,1000,100).reshape(10,10)
df=pd.DataFrame(arr,columns=labels)
df

Unnamed: 0,languishment,dividendus,omophagies,lida,venizelist,lippier,buttonhook,noninductive,sigillistic,tonneaus
0,92,883,268,271,565,420,890,506,242,298
1,804,735,343,672,441,777,657,317,765,867
2,506,54,123,356,12,719,150,268,13,614
3,989,424,293,412,760,567,678,767,896,596
4,285,614,516,79,289,216,947,16,762,705
5,768,771,368,98,993,162,844,867,900,994
6,379,593,441,932,419,917,86,392,626,765
7,546,817,88,838,55,457,596,605,302,467
8,421,682,352,845,963,545,536,716,844,56
9,284,618,684,665,672,897,181,642,907,590


<h1 style = "font-size:200%;color:
DodgerBlue;">Attributes</h1>

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓪𝓽</span></h1>

Access a single value for a row/column label pair.

Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series.

In [30]:
df.at[6,'dividendus']

593


<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓪𝔁𝓮𝓼</span></h1>

Return a list representing the axes of the DataFrame.

It has the row axis labels and column axis labels as the only members. They are returned in that order.

In [None]:
df.axes

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓬𝓸𝓵𝓾𝓶𝓷𝓼</span></h1>

The column labels of the DataFrame.

In [None]:
df.columns

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓭𝓽𝔂𝓹𝓮𝓼</span></h1>

Return the dtypes in the DataFrame.

This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype.

In [None]:
df.dtypes

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓮𝓶𝓹𝓽𝔂</span></h1>

Checks if a Series/DataFrame is completely empty

In [None]:
df.empty

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓲𝓪𝓽</span></h1>

Access a single value for a row/column pair by integer position.

Similar to iloc, in that both provide integer-based lookups. Use iat if you only need to get or set a single value in a DataFrame or Series

In [None]:
df.iat[0,0]

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓲𝓵𝓸𝓬</span></h1>

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:

An integer, e.g. 5.

A list or array of integers, e.g. [4, 3, 0].

A slice object with ints, e.g. 1:7.

A boolean array.

A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).

In [None]:
df.iloc[0:,0:]

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓲ndex</span></h1>

In [None]:
df.index

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓵𝓸𝓬</span></h1>

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).

A list or array of labels, e.g. ['a', 'b', 'c'].

A slice object with labels, e.g. 'a':'f'.

Warning

Note that contrary to usual python slices, both the start and the stop are included

A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

An alignable boolean Series. The index of the key will be aligned before masking.

An alignable Index. The Index of the returned selection will be the input.

A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)

In [None]:
df.loc[5]

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓷𝓭𝓲𝓶</span></h1>

Return an int representing the number of axes / array dimensions.

Return 1 if Series. Otherwise return 2 if DataFrame.

In [None]:
df.ndim

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓼𝓱𝓪𝓹𝓮</span></h1>

Return a tuple representing the dimensionality of the DataFrame.

In [None]:
df.shape

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓼𝓲𝔃𝓮</span></h1>

Return an int representing the number of elements in this object.

Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.

In [None]:
df.size

<h1><span style="color:#7E30AA">𝓓𝓪𝓽𝓪𝓕𝓻𝓪𝓶𝓮.</span>
<span style="color:#66CC66">𝓿𝓪𝓵𝓾𝓮𝓼</span></h1>

Return a Numpy representation of the DataFrame.

In [None]:
df.values

<h1><span style="color:#7E30AA">Read and Write</span>
<span style="color:#66CC66">Tabular data</span></h1>

pandas supports the integration with many file formats or data sources out of the box (csv, excel, sql, json, parquet,…). Importing data from each of these data sources is provided by function with the prefix read_*. Similarly, the to_* methods are used to store data.