#  PANDAS

Pandas is an open-source python library that is used for data manipulation and analysis. It provides many functions and methods to speed up the data analysis process. Pandas is built on top of the NumPy package, hence it takes a lot of basic inspiration from it.

Pandas is a software library written for the Python programming language for data manipulation and analysis. ... 
The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals.

The two primary data structures are Series which is 1 dimensional and DataFrame which is 2 dimensional.

![create-series-in-python-pandas-0.png](attachment:create-series-in-python-pandas-0.png)

# Install Pandas

In [3]:
#pip install pandas #(python instll pckges)

In [5]:
import pandas as pd

# Series

A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

In [5]:
pd.Series([11, 23, 34, 41, 55])

0    11
1    23
2    34
3    41
4    55
dtype: int64

In [12]:
pd.Series([11, 23, 34, 41, 55],index = ['Coca Cola', 'Sprite', 'Coke', 'Fanta', 'Dew'],name="Drinks")

Coca Cola    11
Sprite       23
Coke         34
Fanta        41
Dew          55
Name: Drinks, dtype: int64

# DataFrame

A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.

In [13]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


In [14]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


We are using the pd.DataFrame() constructor to generate these DataFrame objects. The syntax for declaring a new one is a dictionary whose keys are the column names (Bob and Sue in this example), and whose values are a list of entries. This is the standard way of constructing a new DataFrame, and the one you are most likely to encounter.

The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the row labels. Sometimes this is OK, but oftentimes we will want to assign these labels ourselves.

The list of row labels used in a DataFrame is known as an Index. We can assign values to it by using an index parameter in our constructor:

In [30]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 2]},
             index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,2


# 1. Read

Note – There are various other methods to read different types of files, such as read_json(), read_html(), read_excel(), etc which can be easily used as per the requirement.

In [17]:
pwd()

'C:\\Users\\kulbh\\gamaka'

In [11]:
df=pd.read_csv("Data.csv")

In [12]:
df

Unnamed: 0,name,maths,phy,result
0,tom,80,70,Pass
1,jhon,75,67,Pass
2,max,28,55,Fail
3,raj,88,70,Pass
4,shyam,23,29,Fail
5,ram,95,89,Pass


In [22]:
type(df)

pandas.core.frame.DataFrame

In [23]:
df["maths"]

0    80
1    75
2    28
3    88
4    23
5    95
Name: maths, dtype: int64

In [25]:
df[["maths","name"]]

Unnamed: 0,maths,name
0,80,tom
1,75,jhon
2,28,max
3,88,raj
4,23,shyam
5,95,ram


In [28]:
df[["result","name", 'maths']]

Unnamed: 0,result,name,maths
0,Pass,tom,80
1,Pass,jhon,75
2,Fail,max,28
3,Pass,raj,88
4,Fail,shyam,23
5,Pass,ram,95


# 2. head()
Then, we will use pandas “head()” function to display the top 5 rows from our data set. Note – We can provide the no. of rows that we want to display by providing the count as a parameter to the “head()” function, i.e. (df.head(10) – this will now display 10 rows from our dataset).

In [36]:
df.head()

Unnamed: 0,name,maths,phy,result
0,tom,80,70,Pass
1,jhon,75,67,Pass
2,max,28,55,Fail
3,raj,88,70,Pass
4,shyam,23,29,Fail


In [37]:
df.head(2)

Unnamed: 0,name,maths,phy,result
0,tom,80,70,Pass
1,jhon,75,67,Pass


# 3. tail()

There is a “tail()” method as well which will show the last 5 rows from our data set.

In [38]:
df.tail()

Unnamed: 0,name,maths,phy,result
1,jhon,75,67,Pass
2,max,28,55,Fail
3,raj,88,70,Pass
4,shyam,23,29,Fail
5,ram,95,89,Pass


In [42]:
df.tail(2)

Unnamed: 0,name,maths,phy,result
4,shyam,23,29,Fail
5,ram,95,89,Pass


In [43]:
df2=pd.read_table("tdata.tsv")

In [44]:
df2

Unnamed: 0,name,maths,phy,result
0,tom,80,70,Pass
1,jhon,75,67,Pass
2,max,28,55,Fail
3,raj,88,70,Pass
4,shyam,23,29,Fail
5,ram,95,89,Pass


In [45]:
df3=pd.read_csv("tdata.tsv")

In [46]:
df3

Unnamed: 0,name\tmaths\tphy\tresult
0,tom\t80\t70\tPass
1,jhon\t75\t67\tPass
2,max\t28\t55\tFail
3,raj\t88\t70\tPass
4,shyam\t23\t29\tFail
5,ram\t95\t89\tPass


In [6]:
df3=pd.read_csv("tdata.tsv",sep="\t")

In [17]:
df3[0:2]

Unnamed: 0,name,maths,phy,result
0,tom,80,70,Pass
1,jhon,75,67,Pass


In [19]:
df3[0:4:2]

Unnamed: 0,name,maths,phy,result
0,tom,80,70,Pass
2,max,28,55,Fail


In [20]:
df3[-1::-2]

Unnamed: 0,name,maths,phy,result
5,ram,95,89,Pass
3,raj,88,70,Pass
1,jhon,75,67,Pass
