# **Pandas**

Pandas is your go-to tool for handling data—cleaning, transforming, and analyzing it.

With Pandas, you can:

- Load data from CSVs into a DataFrame (a table-like structure).
- Calculate statistics like mean, median, min, and max.
- Check correlations and data distributions.
- Clean data by removing missing values and filtering rows/columns.
- Visualize data using Matplotlib (bar charts, histograms, etc.).
- Save processed data back to a file.

Before modeling or complex visualizations, understanding your dataset is key—Pandas makes that easy.


https://pandas.pydata.org/docs/index.html

### **Import libraries**

In [None]:
import pandas as pd

In [None]:
# help(pd)

## **Creating/Reading DataFrames**

### **Creating DataFrames from scratch**

Creating DataFrames in Python is useful for testing Pandas methods. A simple way is using a dictionary.


In [None]:
data = {"Customer_ID":[1,2,3],
        "Customer_names":["Ahmad","Aliyan","Usman"],
        "Age":[23,24,25]}

In [None]:
data

{'Customer_ID': [1, 2, 3],
 'Customer_names': ['Ahmad', 'Aliyan', 'Usman'],
 'Age': [23, 24, 25]}

In [None]:
customer_data = pd.DataFrame(data)

In [None]:
customer_data

Unnamed: 0,Customer_ID,Customer_names,Age
0,1,Ahmad,23
1,2,Aliyan,24
2,3,Usman,25


In [None]:
customer_data.to_csv("customer_data.csv")

### **Read dataset**

In [None]:
# CSV, excel, json, from clipboard
pd.read_csv("/content/customer_data.csv")

Unnamed: 0.1,Unnamed: 0,Customer_ID,Customer_names,Age
0,0,1,Ahmad,23
1,1,2,Aliyan,24
2,2,3,Usman,25


In [None]:
df = pd.read_csv("/content/converted_data.csv")
df

Unnamed: 0,order_id,customer_name,product_name,quantity,order_date,city
0,301,Bilal Hussain,Laptop,2,2025-09-10,Islamabad
1,302,Maryam Shah,Smartphone,1,2025-09-11,Islamabad
2,303,Ahmed Khan,Headphones,4,2025-09-12,Islamabad
3,304,Sana Qureshi,Smartphone,3,2025-09-13,Islamabad
4,305,Omar Farooq,Laptop,1,2025-09-14,Islamabad


In [None]:
pd.read_excel("/content/converted_data.xls")

Unnamed: 0,order_id,customer_name,product_name,quantity,order_date,city
0,301,Bilal Hussain,Laptop,2,2025-09-10,Islamabad
1,302,Maryam Shah,Smartphone,1,2025-09-11,Islamabad
2,303,Ahmed Khan,Headphones,4,2025-09-12,Islamabad
3,304,Sana Qureshi,Smartphone,3,2025-09-13,Islamabad
4,305,Omar Farooq,Laptop,1,2025-09-14,Islamabad


In [None]:
# replace "edit?usp=sharing" with "export?format=csv"


In [None]:
url = "https://docs.google.com/spreadsheets/d/11zB2Q8-QupGoZrciiV00VC2T5PULJX7m9QP1uyjDots/export?format=csv"

In [None]:
data = pd.read_csv(url)

In [None]:
data

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
0,141234.0,iPhone,1.0,700.00,01/22/19 21:25,"944 Walnut St, Boston, MA 02215"
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035"
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016"
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001"
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301"
...,...,...,...,...,...,...
185970,319666.0,Lightning Charging Cable,1.0,14.95,12/11/19 20:58,"14 Madison St, San Francisco, CA 94016"
185971,319667.0,AA Batteries (4-pack),2.0,3.84,12/01/19 12:01,"549 Willow St, Los Angeles, CA 90001"
185972,319668.0,Google PHONE.,1.0,600.00,12/09/19 6:43,"273 Wilson St, Seattle, WA 98101"
185973,319669.0,Wired Headphones,1.0,11.99,12/03/19 10:39,"778 River St, Dallas, TX 75001"


## **Explore the dataset**

### **Shape of the dataset**

In [None]:
data.shape

(185975, 6)

### **Top Rows**

In [None]:
data.head() #by default, top 5 rows

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
0,141234.0,iPhone,1.0,700.0,01/22/19 21:25,"944 Walnut St, Boston, MA 02215"
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035"
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016"
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001"
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301"


In [None]:
data.head(20)

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
0,141234.0,iPhone,1.0,700.0,01/22/19 21:25,"944 Walnut St, Boston, MA 02215"
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035"
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016"
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001"
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301"
5,141239.0,AAA Batteries (4-pack),1.0,2.99,01/29/19 20:22,"775 Willow St, San Francisco, CA 94016"
6,141240.0,27in 4K Gaming Monitor,1.0,389.99,01/26/19 12:16,"979 Park St, Los Angeles, CA 90001"
7,141241.0,USB-C Charging Cable,1.0,11.95,01/05/19 12:04,"181 6th St, San Francisco, CA 94016"
8,141242.0,Bose SoundSport Headphones,1.0,99.99,01/01/19 10:30,"867 Willow St, Los Angeles, CA 90001"
9,141243.0,Apple Airpods Headphones,1.0,150.0,01/22/19 21:20,"657 Johnson St, San Francisco, CA 94016"


### **Last Rows**

In [None]:
data.tail(3)

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
185972,319668.0,Google PHONE.,1.0,600.0,12/09/19 6:43,"273 Wilson St, Seattle, WA 98101"
185973,319669.0,Wired Headphones,1.0,11.99,12/03/19 10:39,"778 River St, Dallas, TX 75001"
185974,319670.0,Bose SoundSport Headphones,1.0,99.99,12/21/19 21:45,"747 Chestnut St, Los Angeles, CA 90001"


### **Sample**

In [None]:
data.sample() #by default,it will return 1 row

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
18167,158638.0,27in 4K Gaming Monitor,1.0,389.99,02/26/19 16:02,"417 1st St, Boston, MA 02215"


In [None]:
data.sample(5)

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
76257,214298.0,Apple Airpods Headphones,1.0,150.0,06/06/19 11:33,"513 Forest St, San Francisco, CA 94016"
119904,256262.0,AA Batteries (4-pack),1.0,3.84,09/27/19 20:41,"294 Washington St, Los Angeles, CA 90001"
16838,157357.0,AA Batteries (4-pack),1.0,3.84,02/01/19 10:12,"533 2nd St, Dallas, TX 75001"
177398,311429.0,34in Ultrawide Monitor,1.0,379.99,12/05/19 15:35,"359 Maple St, San Francisco, CA 94016"
180656,314556.0,Macbook Pro Laptop,1.0,1700.0,12/02/19 8:16,"709 1st St, Portland, ME 04101"


### **Information about dataset**


- Dataframe
- Row count or number of rows
- Column Count
- Colum details ( index, name, non-null value_count, data type)
- value_count of data types
- memory_usage


In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185975 entries, 0 to 185974
Data columns (total 6 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   Order ID          185951 non-null  float64
 1   Product           185951 non-null  object 
 2   Quantity Ordered  185901 non-null  float64
 3   Price Each        185946 non-null  float64
 4   Order Date        185951 non-null  object 
 5   Purchase Address  185951 non-null  object 
dtypes: float64(3), object(3)
memory usage: 8.5+ MB


errors:
- Missing values
- data type issue

### **Statistical Summary ( descriptive statistics)**


In [None]:
data.describe() # by default, it will return the summary of numerical columns

Unnamed: 0,Order ID,Quantity Ordered,Price Each
count,185951.0,185901.0,185946.0
mean,230417.792402,1.124415,724.4232
std,51512.688373,0.442846,231902.9
min,141234.0,0.965432,2.99
25%,185831.5,1.0,11.95
50%,230368.0,1.0,14.95
75%,275035.5,1.0,150.0
max,319670.0,9.0,100000000.0


In [None]:
data.describe(include="object")

Unnamed: 0,Product,Order Date,Purchase Address
count,185951,185951,185951
unique,32,142396,140787
top,USB-C Charging Cable,12/15/19 20:16,"193 Forest St, San Francisco, CA 94016"
freq,21903,8,9


In [None]:
data.describe(include="all")

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
count,185951.0,185951,185901.0,185946.0,185951,185951
unique,,32,,,142396,140787
top,,USB-C Charging Cable,,,12/15/19 20:16,"193 Forest St, San Francisco, CA 94016"
freq,,21903,,,8,9
mean,230417.792402,,1.124415,724.4232,,
std,51512.688373,,0.442846,231902.9,,
min,141234.0,,0.965432,2.99,,
25%,185831.5,,1.0,11.95,,
50%,230368.0,,1.0,14.95,,
75%,275035.5,,1.0,150.0,,


In [None]:
a = data.describe(include="all")

In [None]:
a.to_csv("summary.csv")

### **Check the data types**


```
"Ahmad", "[1,2,3,4]" => objects
numeric => int 5,6,6
        => float 5.6
       

   1
   4
   6
   78
   9



   9.8
   7.8
   5.6
   9.0


   "ahmad"
   "ali"
   "usman"


  34
  23
  26
  28
  "27 yrs"



4
5
7
1
2
4.5


1
2
3

4
5

IF YOU HAVE A BLANK VALUE IN A INT COLUMN, THEN IT WILL BE CONVERTED TO A FLOAT


int => int
int + 1 float value => float
int/ float + object => object
datetime / date. => date data

```

In [None]:
data.dtypes

Unnamed: 0,0
Order ID,float64
Product,object
Quantity Ordered,float64
Price Each,float64
Order Date,object
Purchase Address,object


### **Column names**

In [None]:
#list the name of the columns in the dataset
data.columns

Index(['Order ID', 'Product', 'Quantity Ordered', 'Price Each', 'Order Date',
       'Purchase Address'],
      dtype='object')

### **Unique Values**

In [None]:
data['Quantity Ordered'].unique()

array([1.      , 2.      , 3.      , 5.      , 4.      , 7.      ,
       6.      ,      nan, 0.965432, 9.      , 8.      ])

In [None]:
data['Product'].unique() #RETURNS CLASSES/ UNIQUE VALUES IN THE COLUMNS

array(['iPhone', 'Lightning Charging Cable', 'Wired Headphones',
       '27in FHD Monitor', 'AAA Batteries (4-pack)',
       '27in 4K Gaming Monitor', 'USB-C Charging Cable',
       'Bose SoundSport Headphones', 'Apple Airpods Headphones',
       'Macbook Pro Laptop', 'Flatscreen TV', 'Google PHONE.',
       'AA Batteries (4-pack)', 'Google Phone', '20in Monitor',
       '34in Ultrawide Monitor', 'ThinkPad Laptop', 'LG Dryer',
       'LG Washing Machine', '            iPhone       ', 'GOOGLE. PHONE',
       'IPhone', 'IPHONE', 'I Phone', 'Google-Phone', nan, 'iphone.',
       'google phoone', 'google phone', 'iphone', 'i-Phone',
       'google Phone', 'ASUS Gaming Laptop'], dtype=object)

In [None]:
data["Product"].nunique()

32

In [None]:
data.describe(include="object")

Unnamed: 0,Product,Order Date,Purchase Address
count,185951,185951,185951
unique,32,142396,140787
top,USB-C Charging Cable,12/15/19 20:16,"193 Forest St, San Francisco, CA 94016"
freq,21903,8,9


### **Value counts**

In [None]:
data["Product"].value_counts()

Unnamed: 0_level_0,count
Product,Unnamed: 1_level_1
USB-C Charging Cable,21903
Lightning Charging Cable,21658
AAA Batteries (4-pack),20641
AA Batteries (4-pack),20577
Wired Headphones,18882
Apple Airpods Headphones,15549
Bose SoundSport Headphones,13325
27in FHD Monitor,7507
27in 4K Gaming Monitor,6230
34in Ultrawide Monitor,6181


In [None]:
data["Quantity Ordered"].value_counts()

Unnamed: 0_level_0,count
Quantity Ordered,Unnamed: 1_level_1
1.0,168502
2.0,13324
3.0,2920
4.0,806
5.0,236
6.0,80
7.0,24
8.0,5
9.0,3
0.965432,1


### **Loc and iloc**

In [None]:
data["Product"]

Unnamed: 0,Product
0,iPhone
1,Lightning Charging Cable
2,Wired Headphones
3,27in FHD Monitor
4,Wired Headphones
...,...
185970,Lightning Charging Cable
185971,AA Batteries (4-pack)
185972,Google PHONE.
185973,Wired Headphones


#### **iLoc**

- index based




In [None]:
data.iloc[0] # you have entered the single row_index

Unnamed: 0,0
Order ID,141234.0
Product,iPhone
Quantity Ordered,1.0
Price Each,700.0
Order Date,01/22/19 21:25
Purchase Address,"944 Walnut St, Boston, MA 02215"


In [None]:
data.iloc[1:5]


# data.iloc[starting_index : ending_index]
# ending index will be excluded
# 5 will be excluded

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035"
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016"
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001"
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301"


In [None]:
data.iloc[1:5, 0:3] #slice

Unnamed: 0,Order ID,Product,Quantity Ordered
1,141235.0,Lightning Charging Cable,1.0
2,141236.0,Wired Headphones,2.0
3,141237.0,27in FHD Monitor,1.0
4,141238.0,Wired Headphones,1.0


In [None]:
data.iloc[[5,1000,1500], [0,1,3,5]]

Unnamed: 0,Order ID,Product,Price Each,Purchase Address
5,141239.0,AAA Batteries (4-pack),2.99,"775 Willow St, San Francisco, CA 94016"
1000,142192.0,USB-C Charging Cable,11.95,"68 11th St, Boston, MA 02215"
1500,142668.0,AAA Batteries (4-pack),2.99,"415 Washington St, San Francisco, CA 94016"


In [None]:
data[data["Order ID"]==141238.0] #conditional statements

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301"


#### **Loc**

- Label based (column_names)

In [None]:
data.loc[[5,1000,1500], ["Order ID","Price Each","Purchase Address"]]

Unnamed: 0,Order ID,Price Each,Purchase Address
5,141239.0,2.99,"775 Willow St, San Francisco, CA 94016"
1000,142192.0,11.95,"68 11th St, Boston, MA 02215"
1500,142668.0,2.99,"415 Washington St, San Francisco, CA 94016"


In [None]:
data.loc[5:10,"Order ID":"Price Each"]
#include the ending label/data

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each
5,141239.0,AAA Batteries (4-pack),1.0,2.99
6,141240.0,27in 4K Gaming Monitor,1.0,389.99
7,141241.0,USB-C Charging Cable,1.0,11.95
8,141242.0,Bose SoundSport Headphones,1.0,99.99
9,141243.0,Apple Airpods Headphones,1.0,150.0
10,141244.0,Apple Airpods Headphones,1.0,150.0


loc vs iloc

- both are used to fetch data

- loc:
  - label/ column name based
  - ending values like rows/ columns are  included


- iloc:
  - index based
  - ending values are not included

### **Adding a column**

In [None]:
data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
0,141234.0,iPhone,1.0,700.0,01/22/19 21:25,"944 Walnut St, Boston, MA 02215"
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035"
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016"
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001"
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301"


In [None]:
data["Extra"] = "Maimoona"

In [None]:
data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address,Extra
0,141234.0,iPhone,1.0,700.0,01/22/19 21:25,"944 Walnut St, Boston, MA 02215",Maimoona
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035",Maimoona
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016",Maimoona
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001",Maimoona
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301",Maimoona


In [None]:
data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
0,141234.0,iPhone,1.0,700.0,01/22/19 21:25,"944 Walnut St, Boston, MA 02215"
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035"
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016"
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001"
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301"


In [None]:
data["Purchase Address"].str.split(",", expand=True)

Unnamed: 0,0,1,2
0,944 Walnut St,Boston,MA 02215
1,185 Maple St,Portland,OR 97035
2,538 Adams St,San Francisco,CA 94016
3,738 10th St,Los Angeles,CA 90001
4,387 10th St,Austin,TX 73301
...,...,...,...
185970,14 Madison St,San Francisco,CA 94016
185971,549 Willow St,Los Angeles,CA 90001
185972,273 Wilson St,Seattle,WA 98101
185973,778 River St,Dallas,TX 75001


In [None]:
data[["Street","City","State"]] = data["Purchase Address"].str.split(",", expand=True)

In [None]:
data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address,Street,City,State
0,141234.0,iPhone,1.0,700.0,01/22/19 21:25,"944 Walnut St, Boston, MA 02215",944 Walnut St,Boston,MA 02215
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035",185 Maple St,Portland,OR 97035
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016",538 Adams St,San Francisco,CA 94016
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001",738 10th St,Los Angeles,CA 90001
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301",387 10th St,Austin,TX 73301


```
944 Walnut St, Boston, MA 02215

944 Walnut St
 Boston
 MA 02215
 ```

In [None]:
data["City"] = data["City"].str.strip(" ")

In [None]:
data["State"] = data["State"].str.strip(" ")

In [None]:
data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address,Street,City,State
0,141234.0,iPhone,1.0,700.0,01/22/19 21:25,"944 Walnut St, Boston, MA 02215",944 Walnut St,Boston,MA 02215
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035",185 Maple St,Portland,OR 97035
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016",538 Adams St,San Francisco,CA 94016
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001",738 10th St,Los Angeles,CA 90001
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301",387 10th St,Austin,TX 73301


In [None]:
data[["State","Zip Code"]] = data["State"].str.split(" ",expand=True)

In [None]:
data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address,Street,City,State,Zip Code
0,141234.0,iPhone,1.0,700.0,01/22/19 21:25,"944 Walnut St, Boston, MA 02215",944 Walnut St,Boston,MA,2215
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035",185 Maple St,Portland,OR,97035
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016",538 Adams St,San Francisco,CA,94016
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001",738 10th St,Los Angeles,CA,90001
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301",387 10th St,Austin,TX,73301


In [None]:
data["City"].value_counts()

Unnamed: 0_level_0,count
City,Unnamed: 1_level_1
San Francisco,44732
Los Angeles,29606
New York City,24876
Boston,19934
Atlanta,14881
Dallas,14820
Seattle,14732
Portland,12465
Austin,9905


### **Dropping a column**


In [None]:
data.drop(columns="Purchase Address",inplace=True)

In [None]:
data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Street,City,State,Zip Code
0,141234.0,iPhone,1.0,700.0,01/22/19 21:25,944 Walnut St,Boston,MA,2215
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,185 Maple St,Portland,OR,97035
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,538 Adams St,San Francisco,CA,94016
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,738 10th St,Los Angeles,CA,90001
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,387 10th St,Austin,TX,73301


In [None]:
#1st way

data = data.drop(columns="Extra")

In [None]:
data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
0,141234.0,iPhone,1.0,700.0,01/22/19 21:25,"944 Walnut St, Boston, MA 02215"
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035"
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016"
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001"
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301"


In [None]:
# data = data without extra column

# data operation( mention inplace=True)

In [None]:
#2nd way: inplace

data.drop(columns="Extra",inplace=True)

In [None]:
data.head()

Unnamed: 0,Order ID,Product,Quantity Ordered,Price Each,Order Date,Purchase Address
0,141234.0,iPhone,1.0,700.0,01/22/19 21:25,"944 Walnut St, Boston, MA 02215"
1,141235.0,Lightning Charging Cable,1.0,14.95,01/28/19 14:15,"185 Maple St, Portland, OR 97035"
2,141236.0,Wired Headphones,2.0,11.99,01/17/19 13:33,"538 Adams St, San Francisco, CA 94016"
3,141237.0,27in FHD Monitor,1.0,149.99,01/05/19 20:33,"738 10th St, Los Angeles, CA 90001"
4,141238.0,Wired Headphones,1.0,11.99,01/25/19 11:59,"387 10th St, Austin, TX 73301"


In [None]:
data.dtypes

Unnamed: 0,0
Order ID,float64
Product,object
Quantity Ordered,float64
Price Each,float64
Order Date,object
Street,object
City,object
State,object
Zip Code,object


### **Missing values**

In [None]:
data.isnull().sum()

Unnamed: 0,0
Order ID,24
Product,24
Quantity Ordered,74
Price Each,29
Order Date,24
Street,24
City,24
State,24
Zip Code,24


### **Duplicates**

In [None]:
data.duplicated().sum()

np.int64(289)

- data type issues
- duplicates
- missing values
- outliers
- text inconsistency ( iphone, IPHONE)

In [None]:
data.to_csv("data.csv",index=False)