### Pandas

In [1]:
import numpy as np
import pandas as pd

### 1. Working with Pandas Series
#### a) Creating Series
Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Labels need not be unique but must be a hashable type. The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index.

In [2]:
pd.__version__

'2.1.4'

Series through list

In [3]:
lst = [1,2,3,4,5]
pd.Series(lst) # Index:  Elements

0    1
1    2
2    3
3    4
4    5
dtype: int64

Series through Numpy array

In [5]:
arr = np.array([1,2,3,4,5])
pd.Series(arr)

0    1
1    2
2    3
3    4
4    5
dtype: int64

Giving index from our own end

In [7]:
pd.Series(index=['Isha','Jaha','Chavvi','Mira'],data=[1,2,3,4])

Isha      1
Jaha      2
Chavvi    3
Mira      4
dtype: int64

Series through Dictionary values

In [9]:
steps = {'day_1':4000,'day_2':3000,'day_3':10000}
pd.Series(steps)

day_1     4000
day_2     3000
day_3    10000
dtype: int64

#### Using repeat function along with creating a Series
- Pandas Series.repeat() function repeat elements of a Series. It returns a new Series where each element of the current Series is repeated consecutively a given number of times.

In [11]:
pd.Series(5).repeat(7) # Since it is repeating itself index is same

0    5
0    5
0    5
0    5
0    5
0    5
0    5
dtype: int64

we can use the reset function to make the index accurate

In [16]:
pd.Series(5).repeat(3).reset_index(drop=True)

0    5
1    5
2    5
dtype: int64

This code indicates:
- 10 should be repeated 5 times
- 20 should be repeated 2 times

In [20]:
s= pd.Series([10,20]).repeat([5,2]).reset_index(drop=True)
s

0    10
1    10
2    10
3    10
4    10
5    20
6    20
dtype: int64

Accessing Elements

In [22]:
s[0] # Element at index=0

10

In [23]:
s[5] # Returns element at index=5

20

In [24]:
s[1:4] # Slicing

1    10
2    10
3    10
dtype: int64

In [25]:
s[1:-1] # Remove 1st and last index

1    10
2    10
3    10
4    10
5    20
dtype: int64

### b) Aggregate function on pandas Series
Pandas Series.aggregate() function aggregate using one or more operations over the specified axis in the given series objec

### c) Series absolute function
Pandas Series.abs() method is used to get the absolute numeric value of each element in Series/DataFrame.

### d) Appending Series
* Pandas Series.append() function is used to concatenate two or more series object.

* Syntax: Series.append(to_append, ignore_index=False, verify_integrity=False)

* Parameter : to_append : Series or list/tuple of Series ignore_index : If True, do not use the index labels. verify_integrity : If True, raise Exception on creating index with duplicates

### e) Astype function
Pandas astype() is the one of the most important methods. It is used to change data type of a series. When data frame is made from a csv file, the columns are imported and data type is set automatically which many times is not what it actually should have

### f) Between Function
Pandas between() method is used on series to check which values lie between first and second argumen

### g) All strings functions can be used to extract or modify texts in a series
 
-  Upper and Lower Function
-  Len function
-  Strip Function
-  Split Function
-  Contains Function
-  Replace Function
-  Count Function
-  Startswith and Endswith Function
-  Find Finction

### h) Converting a Series to List
Pandas tolist() is used to convert a series to list. Initially the series is of type pandas.core.series.

### 2. Detailed Coding Implementations on Pandas DataFrame
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

#### a) Creating Data Frames
- In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc. Dataframe can be created in different ways here are some ways by which we create a dataframe:

- Creating a dataframe using List:
DataFrame can be created using a single list or a list of lists.

#### Creating DataFrame from dict of ndarray/lists:
To create DataFrame from dict of narray/list, all the narray must be of same length. If index is passed then the length index should be equal to the length of arrays. If no index is passed, then by default, index will be range(n) where n is the array length.

- A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.
- Column Selection: In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

### b) Slicing in DataFrames Using iloc and loc
Pandas comprises many methods for its proper functioning. loc() and iloc() are one of those methods. These are used in slicing data from the Pandas DataFrame. They help in the convenient selection of data from the DataFrame in Python. They are used in filtering the data according to some conditions.

#### Basic loc Operations
Python loc() function The loc() function is label based data selecting method which means that we have to pass the name of the row or column which we want to select. This method includes the last element of the range passed in it, unlike iloc(). loc() can accept the boolean data unlike iloc(). Many operations can be performed using the loc() method like

#### Basic iloc Operations
The iloc() function is an indexed-based selecting method which means that we have to pass an integer index in the method to select a specific row/column. This method does not include the last element of the range passed in it unlike loc(). iloc() does not accept the boolean data unlike loc().

### c) Slicing Using Conditions
Using Conditions works with loc basically

* So we could extract only those data for which the value is more than 20
* For the columns we have used comma(,) to extract specifc columns which is 'three' and 'four' <br>
Let's see another example