<a href="https://colab.research.google.com/github/Mrityunjayyshukla/python-roadmap/blob/main/Pandas_Tutorial_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas in Python

Series are Pandas data structures built on top of numpy arrays.

1. Series can also contain index and an optional name, in addition to the array of data.
2. They can be created from other Data Types, but are usually imported from external sources.
3. Two or more Series grouped together form a Pandas Dataframe.

In [None]:
import numpy as np
import pandas as pd

sales = [0,5,155,0,518,0]

# Pandas series function converts Python lists
# and Numpy arrays into Pandas series.
sales_series = pd.Series(sales, name="False")
sales_series

Unnamed: 0,False
0,0
1,5
2,155
3,0
4,518
5,0


Pandas series have key properties:
1. values: data array in the series.
2. index: index array in the series.
3. name: optional name for the series.
4. dtype: the data type of the elements in the values array.

In [None]:
print(sales_series.mean())
print(sales_series.values)
print(sales_series.sum())

113.0
[  0   5 155   0 518   0]
678


In [None]:
# Convert datatypes of the elements in the pandas series.
print(sales_series.astype('float'))
print(sales_series.astype('bool'))

0      0.0
1      5.0
2    155.0
3      0.0
4    518.0
5      0.0
Name: False, dtype: float64
0    False
1     True
2     True
3    False
4     True
5    False
Name: False, dtype: bool


In Pandas, index is used to access each row in a series or a dataframe.
We can even use custom index in pandas dataframe to access the rows.

In [None]:
sales = [0,5,155,0,518]
items=["Coffee","banana","tea","coconut","sugar"]
sales_series = pd.Series(sales, index=items, name="Sales")
print(sales_series)
print(sales_series['tea'])

Coffee       0
banana       5
tea        155
coconut      0
sugar      518
Name: Sales, dtype: int64
155


The .iloc[] method is the preferred way to access values by their positional index.
1. This method works even when Series have a custom, non-integer index.
2. It is more efficient than slicing and is recommended by Pandas creators.


In [None]:
# df.iloc[row_position, column_position]
print(sales_series.iloc[2])  # Accesses the 2nd element
print(sales_series.iloc[1:])

155
banana       5
tea        155
coconut      0
sugar      518
Name: Sales, dtype: int64


The .loc[] method is the preferred way to access values by their custom labels

In [None]:
# df.loc[row_label, column_label]
print(sales_series.loc["tea"])
print(sales_series.loc["banana":"coconut"]) # Both values are inclusive

155
banana       5
tea        155
coconut      0
Name: Sales, dtype: int64


It is possible to have duplicate index values in pandas series or dataframe.

- Accessing these indices by their label using .iloc[] returns all corresponding rows.

In [None]:
sales = [0,5,155,0,518]
items = ["Coffee","Coffee","tea","coconut","sugar"]

sales_series = pd.Series(sales, index=items, name="Sales")
sales_series

Unnamed: 0,Sales
Coffee,0
Coffee,5
tea,155
coconut,0
sugar,518


In [None]:
sales_series.loc["Coffee"]
# All the values that have "Coffee" as their index will be returned.

Unnamed: 0,Sales
Coffee,0
Coffee,5


You can reset the index in a Panda Series or DataFrame back to default range of integers by using the .reset_index() method.

- By default, the existing index will become a new column in a DataFrame.

In [None]:
sales_series.reset_index()

Unnamed: 0,index,Sales
0,Coffee,0
1,Coffee,5
2,tea,155
3,coconut,0
4,sugar,518


## Filtering Series

You can filter a series by passing a logical test into the .loc[] accessor

In [None]:
sales_series.loc[sales_series>0]

Unnamed: 0,Sales
Coffee,5
tea,155
sugar,518


Logical tests are available in Pandas to show some specific elements of the series only.

- Equal: .eq()
- Not Equal: .ne()
- Less Than or Equal: le()
- Less Than: .lt()
- Greater Than or Equal: .ge()
- Greater Than: .gt()
- Membership Test: .isin()
- Inverse Membership: ~.isin()

In [None]:
print(sales_series.index.isin(["coffee","tea"]))
print(~sales_series.index.isin(["coffee","tea"]))

[False False  True False False]
[ True  True False  True  True]


## Sorting Series

Sorting can be done on a series by their values or their index.

In [None]:
# sort_values() method sorts a Series by its values

# Ascending Sort
print(sales_series.sort_values())
# Descending Sort
print(sales_series.sort_values(ascending=False))

Coffee       0
coconut      0
Coffee       5
tea        155
sugar      518
Name: Sales, dtype: int64
sugar      518
tea        155
Coffee       5
Coffee       0
coconut      0
Name: Sales, dtype: int64


In [None]:
# sort_index() method sorts a Series by its index

# Ascending Sort
print(sales_series.sort_index())
# Descending Sort
print(sales_series.sort_index(ascending=False))

Coffee       0
Coffee       5
coconut      0
sugar      518
tea        155
Name: Sales, dtype: int64
tea        155
sugar      518
coconut      0
Coffee       0
Coffee       5
Name: Sales, dtype: int64
