# What is Pandas?

Imagine you walk into a your house, and ther's data everywhere, like your laundry, your bills, your old mixtapes. That's like without Pandas. Chaos.

Now imagine Pandas as your super organized cousin who walks in and says:

"Why you living like this? Let me handle it!"

Suddenly, your life's a spreadsheet, and everything's labelled, sorted, and ready to impress your mom.

Pandas is a Python library that helps you work with data in a structured way. It's like a superpower for 
data manipulation and analysis. With Pandas, you can easily read, write, and manipulate data in various formats, 
like CSV, Excel, and JSON. It's like having a personal data butler who keeps everything tidy and organized.

In [3]:
# Importing pandas and numpy with alias
import pandas as pd
import numpy as np

# DataFrames and Series
DataFrames: A DataFrame is a two-dimensional, labeled data structure in Pandas that can store data of different types (such as integers, floats, and strings) in columns. It's essentially a table with rows and columns, where each column can have a different data type.

Series: A Series is a one-dimensional labeled array in Pandas. It is like a single column of a DataFrame but can also exist independently. Each element in a Series is associated with an index, which labels the elements.

## 1. Creating Series and DataFrames

In [4]:
# Creating a series form a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
series

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [5]:
# Creating a series from a dictionary
data_dict = {'a':1, 'b':2, 'c':3}
series_from_dict = pd.Series(data_dict)
series_from_dict

a    1
b    2
c    3
dtype: int64

In [6]:
# Creating a DataFrame
data = {
    'Name' : ['John', 'Jane', 'Paul', 'Anna'],
    'Age' : [28, 16, 23, 42],
    'City' : ['New York', 'California', 'Paris', 'London']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,John,28,New York
1,Jane,16,California
2,Paul,23,Paris
3,Anna,42,London


In [7]:
# Specifying Custom Index

df_custom_index = pd.DataFrame(data, index=['a', 'b', 'c', 'd'])
df_custom_index

Unnamed: 0,Name,Age,City
a,John,28,New York
b,Jane,16,California
c,Paul,23,Paris
d,Anna,42,London


In [8]:
# Create DataFrame from file
df = pd.read_csv('sales_data.csv')
df.head(5)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
4,10005,2024-01-05,Beauty Products,Neutrogena Skincare Set,1,89.99,89.99,Europe,PayPal


# 2. Indexing and Selecting Data

In [9]:
# Selecting a column using the column name
df['Units Sold']

0      2
1      1
2      3
3      4
4      1
      ..
235    1
236    3
237    3
238    1
239    2
Name: Units Sold, Length: 240, dtype: int64

In [10]:
# For a datframe, passing a slice : selects matching rows
df[0:3]

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,10001,2024-01-01,Electronics,iPhone 14 Pro,2,999.99,1999.98,North America,Credit Card
1,10002,2024-01-02,Home Appliances,Dyson V11 Vacuum,1,499.99,499.99,Europe,PayPal
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card


In [16]:
# Selection by Label using DataFrame.loc() or DataFrame.at()
df.loc[1] # Single Label
df.loc[[1, 2]] # List of Labels
df.loc[0,'Transaction ID'] # Single Label for row and column
df.loc[0:10,'Product Category'] # Slice with labels for row and single label for column.
df.at[4,'Unit Price']

np.float64(89.99)

In [23]:
# Selection by Position using DataFrame.iloc() and DataFrame.iat()
df.iloc[3] # Select via the position of the passed integers:
df.iloc[2:7,0:3] # Select via Integer Slices
df.iloc[[1,2,3],[3,6,8]] # Select via list of integer position location
df.iloc[1:3,:] # For slicing rows explicitly
df.iloc[:,1:3] # For slicing columns explicitly
df.iloc[1,5] # For getting value explicitly
df.iat[1,2] # For getting fast access to a scalar

'Home Appliances'

In [26]:
# Boolean Indexing
df[df["Units Sold"] > 2] # Select rows where df.A is greater than 0.


Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
2,10003,2024-01-03,Clothing,Levi's 501 Jeans,3,69.99,209.97,Asia,Debit Card
3,10004,2024-01-04,Books,The Da Vinci Code,4,15.99,63.96,North America,Credit Card
5,10006,2024-01-06,Sports,Wilson Evolution Basketball,5,29.99,149.95,Asia,Credit Card
8,10009,2024-01-09,Clothing,Nike Air Force 1,6,89.99,539.94,Asia,Debit Card
11,10012,2024-01-12,Sports,Babolat Pure Drive Tennis Racket,3,199.99,599.97,Asia,Credit Card
...,...,...,...,...,...,...,...,...,...
225,10226,2024-08-13,Books,The Silent Patient by Alex Michaelides,3,26.99,80.97,North America,Credit Card
230,10231,2024-08-18,Clothing,Adidas Originals Trefoil Hoodie,4,64.99,259.96,Asia,Debit Card
233,10234,2024-08-21,Sports,Hydro Flask Standard Mouth Water Bottle,3,32.95,98.85,Asia,Credit Card
236,10237,2024-08-24,Clothing,Nike Air Force 1 Sneakers,3,90.00,270.00,Asia,Debit Card


# Missing Data

For NumPy data types, np.nan represents missing data. It is by default not included in the computations.

In [31]:
df2 = df.copy()

# DataFrame.dropna() drops any rows that have missing data:
df2.dropna(how='any')
df2

# DataFrame.fillna() fills missing data:
df2.fillna(value=5)

# isna() gets the boolean mask where values are nan:
pd.isna(df2)

Unnamed: 0,Transaction ID,Date,Product Category,Product Name,Units Sold,Unit Price,Total Revenue,Region,Payment Method
0,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...
235,False,False,False,False,False,False,False,False,False
236,False,False,False,False,False,False,False,False,False
237,False,False,False,False,False,False,False,False,False
238,False,False,False,False,False,False,False,False,False
