# Pandas For Data Science

## Introduction

##### What is Pandas?

- Pandas is a Python library for data analysis.
- It provides high-performance, easy-to-use data structures and data analysis tools.
- Pandas is built on top of NumPy, which provides efficient storage and data manipulation for NumPy arrays.
- Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

##### Features of Pandas

- Fast and efficient DataFrame object with default and customized indexing.
- Tools for loading data into in-memory data objects from different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of date sets.
- Label-based slicing, indexing and subsetting of large data sets.
- Columns from a data structure can be deleted or inserted.
- Group by data for aggregation and transformations.
- High performance merging and joining of data.
- Time Series functionality.

![image.png](attachment:image.png)

## Series & DataFrames

![image.png](attachment:image.png)

#### Pandas Series

In [None]:
import pandas as pd

# Creating a Series with a list of numbers
numbers = pd.Series([1, 2, 3, 4, 5])
print(numbers)

In [None]:
# Accessing the first element
print(numbers[0])

In [None]:
# Accessing a range of elements
print(numbers[1:4])

In [None]:
# Adding index labels to a Series
numbers = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
print(numbers)

In [None]:
# Accessing elements using index labels
print(numbers['c'])

In [None]:
print(numbers[numbers > 3])

In [None]:
# Adding two Series
numbers = pd.Series([1,2,3,4,5,])
numbers2 = pd.Series([6, 7, 8, 9, 10])
print(numbers + numbers2)

In [None]:
my_series = pd.Series([10, 20, 30, 40, 50])

# Applying a function to each element
my_squared_series = my_series.apply(lambda x: x**2)
print(my_squared_series)

In [None]:
my_dict = {"apples": 5, "oranges": 3, "bananas": 8}

# Creating a Series from the dictionary
my_series = pd.Series(my_dict)
print(my_series)

In [None]:
import pandas as pd

# Creating a Series
my_series = pd.Series([10, 20, 30, 40, 50])

# Filtering the Series
my_filtered_series = my_series[my_series > 30]
print(my_filtered_series)

#### Pandas DataFrame

Creating a DataFrame

In [None]:
# Create DataFrame from a dictionary

my_dict = {"apples": [5,8], "oranges": [3,4]}

df = pd.DataFrame(my_dict)

print(df)

type(df)

In [None]:
# Create DataFrame with Index

my_dict = {"apples": [5,8], "oranges": [3,4]}

df = pd.DataFrame(my_dict, index=['A','B'])

df

In [None]:
# Create DataFrame for File

reviews = pd.read_csv("C:/Users/91926/OneDrive/Desktop/winemag-data-130k-v2.csv")

reviews.head()

Indexing, Selecting & Assigning

In [None]:
# Access Column

# reviews['price']

reviews.price

In [None]:
# Access Element

reviews['country'][0]

Indexing In Pandas

In [None]:
# Index-Based Selection

# reviews.iloc[0]

# reviews.iloc[:, 5]

# reviews.iloc[:3, 5]

# reviews.iloc[1:3, 5]

# reviews.iloc[[0, 5, 10], 5]

# reviews.iloc[-5:]

In [None]:
# Label-Based Selection

# reviews.loc[0, 'country']

reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]


##### loc vs iloc

- loc is label based
- iloc is integer index based

##### When to use loc vs iloc

- If you're working with a DataFrame that has meaningful row and column labels, it's usually easier to use loc.
- If you're working with a DataFrame that doesn't have meaningful labels, or if you want to select rows and columns based on their positions, it's usually easier to use iloc.

In [None]:
# Manipulating Index
reviews.set_index("title")

Conditional Selection

In [None]:
# Country = Italy

reviews.country == 'Italy'

In [None]:
# Loc of Country = Italy

reviews.loc[reviews.country=='Italy']

In [None]:
# Loc of Country = Italy and Points >= 90

reviews.loc[(reviews.country=='Italy') & (reviews.points >= 90)]

In [None]:
# Loc of Country = Italy or Points >= 90

reviews.loc[(reviews.country=='Italy') | (reviews.points >= 90)]

In [None]:
# Loc of Wine made in France or Italy using isin

reviews.loc[reviews.country.isin(['France','Italy'])]

In [None]:
# Price notnull

reviews.loc[reviews.price.notnull()]

Assigning Data

In [None]:
reviews['critic'] = 'everyone'
reviews['critic']

In [None]:
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews['index_backwards']