# Reading and Exploring Data with Pandas

## 2.1 Reading Data into Pandas

In [29]:
# importing pandas library

import pandas as pd

print("Pandas imported successfully")

Pandas imported successfully


#### 1. Loading CSV Files

In [30]:
# Method 1 : load from the local file on your computer

df = pd.read_csv('datasets\sales_data.csv')

print("Datasets loaded successfully") # the datasets is loaded successfully




Datasets loaded successfully


#### 2. Loading Excel Files

In [31]:
# loading specific sheet from an excel workbook

df = pd.read_excel("datasets\student_grades.xlsx")

print("Student grades excel sheet loaded successfully")


# loading multiple excel sheets at once



Student grades excel sheet loaded successfully


In [32]:
# loading specific January worksheet from excel

monthly_report = pd.read_excel("datasets\monthly_reports.xlsx", sheet_name = "January")

print("January sheet was loaded successfully")


# loading specific febraury worsheet from excel

monthly_report = pd.read_excel("datasets\monthly_reports.xlsx", sheet_name = "February")

print("February sheet was loaded successfully")


# loading March specific March worksheet from excel

monthly_report = pd.read_excel("datasets\monthly_reports.xlsx", sheet_name = "March")

print("March sheet was loaded successfully")

January sheet was loaded successfully
February sheet was loaded successfully
March sheet was loaded successfully


## 2.2 Exploring DataFrames

In [33]:
# 1. load sales data

df = pd.read_csv("datasets\sales_data.csv")

print("Loaded successfully")

Loaded successfully


In [36]:
# print top 5 rows
print("Top 5 row")
df.head(5)

Top 5 row


Unnamed: 0,Date,Product,Category,Price,Quantity,Customer_Type
0,2024-02-11,Croissant,Food,2.87,5,New
1,2024-01-07,Bagel,Food,2.4,5,New
2,2024-01-29,Mocha,Beverage,5.32,2,New
3,2024-02-14,Americano,Beverage,4.25,2,New
4,2024-02-21,Mocha,Beverage,3.15,1,Regular


In [38]:
# print last 5 rows

print("Last 5 rows")
df.tail(5)

Last 5 rows


Unnamed: 0,Date,Product,Category,Price,Quantity,Customer_Type
25,2024-02-10,Cappuccino,Beverage,2.89,3,Regular
26,2024-03-08,Muffin,Food,2.99,4,New
27,2024-03-29,Espresso,Beverage,5.45,5,New
28,2024-02-20,Espresso,Beverage,4.32,1,New
29,2024-03-13,Croissant,Food,3.03,5,New


### Understanding Your Data Structure

In [48]:
# getting the basic information of your datasets

print("Dataset Information")
print(df.info())



Dataset Information
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Date           30 non-null     object 
 1   Product        30 non-null     object 
 2   Category       30 non-null     object 
 3   Price          30 non-null     float64
 4   Quantity       30 non-null     int64  
 5   Customer_Type  30 non-null     object 
dtypes: float64(1), int64(1), object(4)
memory usage: 1.5+ KB
None


In [49]:
# get to know number of rows and columns you have.

print("Dataset shape")
df.shape



Dataset shape


(30, 6)

In [51]:
# Getting Summary Statistics of your numerical columns in the dataset

print("Summary statistic")
print(df.describe())



Summary statistic
           Price   Quantity
count  30.000000  30.000000
mean    3.750667   3.300000
std     0.920858   1.556964
min     2.400000   1.000000
25%     3.000000   2.000000
50%     3.660000   4.000000
75%     4.305000   5.000000
max     5.490000   5.000000


## 2.3 Selecting data

Now that you know what's in your dataset, let's learn how to grab exactly the pieces you need. Think of
this as using a smart filter to find specific information.

#### Selecting columns

In [54]:
# selecting one column

product_column = df["Product"]
print("Product only needed: ")
print(product_column)

Product only needed: 
0      Croissant
1          Bagel
2          Mocha
3      Americano
4          Mocha
5      Americano
6          Latte
7      Americano
8         Muffin
9      Americano
10     Croissant
11         Bagel
12    Cappuccino
13      Espresso
14     Croissant
15     Americano
16      Espresso
17     Americano
18    Cappuccino
19      Espresso
20     Americano
21         Mocha
22     Americano
23     Croissant
24    Cappuccino
25    Cappuccino
26        Muffin
27      Espresso
28      Espresso
29     Croissant
Name: Product, dtype: object


In [55]:
# Selecting multiple columns

product_and_price = df[["Product", "Price", "Quantity"]]

print(product_and_price)


       Product  Price  Quantity
0    Croissant   2.87         5
1        Bagel   2.40         5
2        Mocha   5.32         2
3    Americano   4.25         2
4        Mocha   3.15         1
5    Americano   5.48         4
6        Latte   4.39         5
7    Americano   5.49         5
8       Muffin   3.57         1
9    Americano   2.79         4
10   Croissant   3.69         4
11       Bagel   3.92         4
12  Cappuccino   3.72         2
13    Espresso   2.73         1
14   Croissant   2.54         5
15   Americano   3.33         1
16    Espresso   4.26         3
17   Americano   3.67         2
18  Cappuccino   4.84         2
19    Espresso   3.78         5
20   Americano   4.60         5
21       Mocha   3.65         2
22   Americano   3.60         2
23   Croissant   3.14         4
24  Cappuccino   2.66         5
25  Cappuccino   2.89         3
26      Muffin   2.99         4
27    Espresso   5.45         5
28    Espresso   4.32         1
29   Croissant   3.03         5


### Selecting row

#### using .iloc[]

In [57]:
first_row = df.iloc[0]
first_row

Date             2024-02-11
Product           Croissant
Category               Food
Price                  2.87
Quantity                  5
Customer_Type           New
Name: 0, dtype: object

In [59]:
first_3_rows = df.iloc[0:3]
first_3_rows

Unnamed: 0,Date,Product,Category,Price,Quantity,Customer_Type
0,2024-02-11,Croissant,Food,2.87,5,New
1,2024-01-07,Bagel,Food,2.4,5,New
2,2024-01-29,Mocha,Beverage,5.32,2,New


### Using .loc[] (Label-Based Selection)

Think of .loc as "I want the row where the condition is true"

## Filtering Data (The Real Magic!) 

This is where pandas really shines – finding exactly what you're looking for:

In [62]:
# 1 filtering the most expensive item

expensive_items = df[df["Price"] > 4.00]
expensive_items.head(5)

Unnamed: 0,Date,Product,Category,Price,Quantity,Customer_Type
2,2024-01-29,Mocha,Beverage,5.32,2,New
3,2024-02-14,Americano,Beverage,4.25,2,New
5,2024-02-27,Americano,Beverage,5.48,4,New
6,2024-02-01,Latte,Beverage,4.39,5,Regular
7,2024-01-10,Americano,Beverage,5.49,5,New


In [63]:
# show details where the customer type is regular
regular_customer = df[df["Customer_Type"] == "Regular" ]
regular_customer.head(5)

Unnamed: 0,Date,Product,Category,Price,Quantity,Customer_Type
4,2024-02-21,Mocha,Beverage,3.15,1,Regular
6,2024-02-01,Latte,Beverage,4.39,5,Regular
8,2024-01-06,Muffin,Food,3.57,1,Regular
10,2024-03-15,Croissant,Food,3.69,4,Regular
11,2024-02-18,Bagel,Food,3.92,4,Regular
