# **Pandas - Python Library**

### **What is Pandas ?**

- **Pandas is a python library which helps in Data Analysis.**
- **Used to analyze data.**
- **Pandas has functions for analyzing, cleaning, exploring and manipulating data.**

### **Why Use Pandas ?**

- **It helps to anlyze big data.**
- **Makes conclusion on statistical theories.**
- **Pandas can clean messy datasets.**
- **Pandas methods can delete rows, clean data, rename columns, access specific data, filter data.**

### **Installing Pandas**

## **- Pandas Series**

- **A pandas series is like column in a table.**
- **It is one dimensional array holding data of any type.**

## **- Pandas Dataframes**

- **Data sets in Pandas are usually multi-dimensional tables, called DataFrames.**
- **Series is like a column, a DataFrame is the whole table.**
- **Pandas dataframe is like 2 dimensional data structure, like two dimensional array, or table with columns and rows.**
- **Dataframe is like table with columns and rows.**

### **Import Pandas**

In [12]:
# importing pandas library so we can use its methods in our notebook
import pandas as pd     # here we create alias name for pandas library which is pd

### **Load Files into DataFrame**

#### **Pandas - Read CSV file**

In [15]:
# loading given dataset using read_csv() method of pandas 
# here df is a variable that stores data which we reading from this pandas_practice1_dataset csv dataset
df = pd.read_csv("C:\\Users\\HP\\Downloads\\pandas_practice1_dataset.csv")

## **Functions -**

### **1.head()** - Displaying rows / Used to view rows

In [18]:
# displaying three rows 
df.head(3)

Unnamed: 0,Product_ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
0,1001,Router,Logitech,Accessories,574.15,188.0,Germany,2020-06-22
1,1002,Laptop,HP,Accessories,1644.89,203.0,UK,2023-10-15
2,1003,Mouse,Logitech,Peripherals,642.71,325.0,India,2019-02-22


In [19]:
# by default head() displays 5 rows
df.head()

Unnamed: 0,Product_ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
0,1001,Router,Logitech,Accessories,574.15,188.0,Germany,2020-06-22
1,1002,Laptop,HP,Accessories,1644.89,203.0,UK,2023-10-15
2,1003,Mouse,Logitech,Peripherals,642.71,325.0,India,2019-02-22
3,1004,Keyboard,Samsung,Peripherals,888.96,439.0,USA,2020-12-22
4,1005,Keyboard,Canon,Electronics,1500.41,51.0,Japan,2019-11-27


### **2. shape** - For displaying count of rows and columns

In [21]:
# display the count of row and columns
df.shape

(515, 8)

- **In this dataset 515 entries and 8 columns are available.**

### **3. columns** - Used to view column names

In [24]:
# display the column names
df.columns

Index(['Product_ID', 'Product_Name', 'Brand', 'Category', 'Price',
       'Stock_Units', 'Country_Of_Origin', 'Launch_Date'],
      dtype='object')

### **4. dtypes** - Used to view datatypes of each column

In [26]:
# displaying datatypes of each column
df.dtypes

Product_ID             int64
Product_Name          object
Brand                 object
Category              object
Price                float64
Stock_Units          float64
Country_Of_Origin     object
Launch_Date           object
dtype: object

- **In this dataset Product_ID is of integer datatype**
- **And Product_Name, Brand, Category, Country_Of_Origin, Launch_Date are of object datatype**
- **And Price, Stock_Units are float datatype**

### **5. duplicated()** - Used to check duplicates

In [29]:
# check duplicate or not for each column in boolean
df.duplicated()

0      False
1      False
2      False
3      False
4      False
       ...  
510     True
511     True
512     True
513     True
514     True
Length: 515, dtype: bool

In [30]:
# isnull() method checks if null and sum() method gives count of null values for each column
df.duplicated().sum()

15

- **In this data 15 duplicate values.**

### **6. drop_duplicates()** - Used to drop duplicates

In [33]:
# drop all the duplicates
df = df.drop_duplicates()

In [34]:
# check the count of duplicates now
df.duplicated().sum()

0

- **Now their are no duplicates in this data.**

### **7. isnull()** - For checking missing values

In [37]:
df.isnull()   # returns True where values are NAN and false otherwise and gives boolean output

Unnamed: 0,Product_ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
0,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...
495,False,False,False,False,False,False,False,False
496,False,False,False,False,False,False,False,False
497,False,False,False,False,False,False,False,False
498,False,False,False,False,False,False,False,False


In [38]:
# check for missing values 
df.isnull().sum()    # isnull() method checks if null and sum() method gives count of null values for each column

Product_ID            0
Product_Name          0
Brand                20
Category              0
Price                20
Stock_Units          20
Country_Of_Origin     0
Launch_Date           0
dtype: int64

- **In this dataset 20 missing values in Brand.**
- **20 missing values in Price**
- **20 missing values in Stock_Units**

### **8. fillna()** - Used to fill missing data 

In [41]:
# filling missing value of Price with mean 
df["Price"] = df["Price"].fillna(df["Price"].mean()) 

In [42]:
# filling missing value of Stock_Units with mean
df["Stock_Units"] = df["Stock_Units"].fillna(df["Stock_Units"].mean())

In [43]:
# filling missing value of Brand with most frequent value (mode)
df['Brand'] = df['Brand'].fillna('Apple')

In [44]:
# check how many missing values are now
df.isnull().sum()

Product_ID           0
Product_Name         0
Brand                0
Category             0
Price                0
Stock_Units          0
Country_Of_Origin    0
Launch_Date          0
dtype: int64

- **Now their are no missing values in this data.**

### **9. describe()** - Used to get statistical summary of data

In [47]:
# used to get basic statistical summary of numerical data
df.describe()

Unnamed: 0,Product_ID,Price,Stock_Units
count,500.0,500.0,500.0
mean,1250.5,1022.956646,258.116667
std,144.481833,563.387959,141.535223
min,1001.0,52.68,0.0
25%,1125.75,533.4725,138.5
50%,1250.5,1022.956646,258.116667
75%,1375.25,1547.385,377.5
max,1500.0,1994.63,497.0


In [48]:
# get statistical summery of numerical columns as well as categorical columns
df.describe(include = "all")

Unnamed: 0,Product_ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
count,500.0,500,500,500,500.0,500.0,500,500
unique,,10,8,3,,,6,440
top,,Laptop,Apple,Peripherals,,,UK,2020-06-22
freq,,62,85,175,,,90,3
mean,1250.5,,,,1022.956646,258.116667,,
std,144.481833,,,,563.387959,141.535223,,
min,1001.0,,,,52.68,0.0,,
25%,1125.75,,,,533.4725,138.5,,
50%,1250.5,,,,1022.956646,258.116667,,
75%,1375.25,,,,1547.385,377.5,,


- **Count :**
   - **Count of the rows 515 and .**
- **Missing Values :**
   - **In this dataset 20 missing values in Brand.**
   - **20 missing values in Price.**
   - **20 missing values in Stock_Units.**
- **Unique Values :**
   - **Unique Product_Names are 10, unique brands are 8, unique categories are 3, unique Country_Of_Origin are 6 and Launch_Date 440.**
- **Most Frequent Values :**
   - **Most of the Product_Name are Laptop which occurs 63 times in a dataset.**
   - **Most of the entries are of Brand Apple which occurs 67 times in a dataset.**
   - **Most of category is Peripherals which occurs 182 times in a dataset.**
   - **Most of the Country_Of_Origin is India which occurs 94 times in a dataset.**
   - **Most of the Launch_Date of product is 2023-05-06 which occurs 4 times in a dataset.**
- **Mean(Average) :**
   - **Average price of products is 1027.03, min price of products is 52.68 and max price of products is 1994.63.**
   - **Average stock units are 256.01, min stock units are 0 and max stock units are is 497.**
- **Standard Deviation :**
   - **Most of the values are fall nearby mean of Price and Stock_Units.**
- **Quartiles :**
   - **Product Price :**
       - **25% of the products cost 533.47 and less.**
       - **50% of products cost 1022 or less(this is the median).**
       - **75% of products cost 1547.39 or less.**
   - **Product Stock_Units :**
       - **25% of products have 138.5 or fewer units in stock.**
       - **50% have 258.12 or fewer units.**
       - **75% have 377.5 or fewer units.**

In [50]:
# check count of missing values
df.isnull().sum()    # filled all missing values

Product_ID           0
Product_Name         0
Brand                0
Category             0
Price                0
Stock_Units          0
Country_Of_Origin    0
Launch_Date          0
dtype: int64

### **10. unique()** - Used to get unique values from a specific column 

In [52]:
# get the unique values in brand column
df['Brand'].unique()

array(['Logitech', 'HP', 'Samsung', 'Canon', 'Apple', 'Asus', 'Lenovo',
       'Dell'], dtype=object)

In [53]:
# get the unique values in Product_Name column
df['Product_Name'].unique()

array(['Router', 'Laptop', 'Mouse', 'Keyboard', 'Tablet', 'Smartphone',
       'Headphones', 'Monitor', 'Webcam', 'Printer'], dtype=object)

In [54]:
# get the unique values in Category column
df['Category'].unique()

array(['Accessories', 'Peripherals', 'Electronics'], dtype=object)

In [55]:
# get the unique values in Country_Of_Origin column
df['Country_Of_Origin'].unique()

array(['Germany', 'UK', 'India', 'USA', 'Japan', 'China'], dtype=object)

### **11. nunique()** - Count no of unique values in a column 

In [57]:
# find the number of unique Country_Of_Origin
df['Country_Of_Origin'].nunique()

6

In [58]:
# find the number of unique Category
df['Category'].nunique()

3

In [59]:
# find the number of unique Product_Name
df['Product_Name'].nunique()

10

In [60]:
# find the number of unique Brand
df['Brand'].nunique()

8

### **12. value_counts()** - Count the number of occurences of each unique value

In [62]:
# count the number of products of each Brand
df['Brand'].value_counts()

Brand
Apple       85
Lenovo      64
Samsung     61
Dell        60
Asus        59
Canon       58
Logitech    57
HP          56
Name: count, dtype: int64

In [63]:
# count the number of products of each category
df['Category'].value_counts()

Category
Peripherals    175
Accessories    168
Electronics    157
Name: count, dtype: int64

In [64]:
# count the number of products of each Product Name
df['Product_Name'].value_counts()

Product_Name
Laptop        62
Smartphone    55
Mouse         54
Keyboard      52
Monitor       51
Printer       51
Webcam        48
Tablet        47
Headphones    44
Router        36
Name: count, dtype: int64

In [65]:
# count the number of products from which country
df['Country_Of_Origin'].value_counts()

Country_Of_Origin
UK         90
India      90
Germany    87
USA        79
Japan      79
China      75
Name: count, dtype: int64

### **13. index()** - Return the row index

In [67]:
# get the row index
df.index    

Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
       ...
       490, 491, 492, 493, 494, 495, 496, 497, 498, 499],
      dtype='int64', length=500)

### **14. len()** - length or entries in dataframe

In [69]:
# get length of the dataframe
len(df)

500

### **15. sum()** - Used to calculate sum of elements in a dataframe

In [71]:
(df['Product_Name'] == 'Tablet').sum()

47

In [72]:
(df['Country_Of_Origin'] == 'India').sum()

90

### **16. info()** -  Used to get summary of dataframe

In [74]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, 0 to 499
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Product_ID         500 non-null    int64  
 1   Product_Name       500 non-null    object 
 2   Brand              500 non-null    object 
 3   Category           500 non-null    object 
 4   Price              500 non-null    float64
 5   Stock_Units        500 non-null    float64
 6   Country_Of_Origin  500 non-null    object 
 7   Launch_Date        500 non-null    object 
dtypes: float64(2), int64(1), object(5)
memory usage: 35.2+ KB


### **17. max() / min()** - Returns maximum and minimum values

In [76]:
# what is maximum and minimum price of products
print("Max price is :", df['Price'].max())
print("Min price is :", df['Price'].min())

Max price is : 1994.63
Min price is : 52.68


### **18. rename()** - Rename the specified column or row

In [78]:
# rename Product_ID to ID
df.rename(columns = {'Product_ID' : 'ID'}, inplace = True)

In [79]:
# display first five rows
df.head()

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
0,1001,Router,Logitech,Accessories,574.15,188.0,Germany,2020-06-22
1,1002,Laptop,HP,Accessories,1644.89,203.0,UK,2023-10-15
2,1003,Mouse,Logitech,Peripherals,642.71,325.0,India,2019-02-22
3,1004,Keyboard,Samsung,Peripherals,888.96,439.0,USA,2020-12-22
4,1005,Keyboard,Canon,Electronics,1500.41,51.0,Japan,2019-11-27


### **19. sort_values()** - Sort values in specific order

In [81]:
# sort the records by Price of product descending
df.sort_values(by = "Price", ascending = False).head()

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
44,1045,Laptop,Apple,Electronics,1994.63,323.0,Germany,2023-12-30
121,1122,Webcam,HP,Accessories,1987.56,110.0,Germany,2020-09-13
482,1483,Mouse,Canon,Electronics,1983.72,447.0,USA,2020-08-19
184,1185,Webcam,Canon,Electronics,1979.61,386.0,USA,2022-04-06
17,1018,Monitor,Samsung,Accessories,1975.86,372.0,China,2022-09-16


In [82]:
# sort the records by stock units descending
df.sort_values(by = "Stock_Units", ascending = False).head()

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
367,1368,Monitor,Apple,Accessories,1177.95,497.0,Japan,2021-07-10
30,1031,Laptop,Samsung,Peripherals,1817.93,497.0,Germany,2021-11-03
170,1171,Webcam,Lenovo,Accessories,1022.956646,496.0,USA,2020-06-22
20,1021,Printer,Logitech,Electronics,272.0,496.0,India,2021-02-27
260,1261,Mouse,HP,Accessories,226.71,496.0,USA,2020-02-16


### **20. tail()** - View last rows

In [84]:
# display last 3 rows
df.tail(3)

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
497,1498,Headphones,Canon,Peripherals,644.31,300.0,China,2018-11-08
498,1499,Headphones,Lenovo,Accessories,816.06,232.0,India,2021-10-26
499,1500,Laptop,Dell,Accessories,1162.04,338.0,USA,2021-01-31


In [85]:
# display last 5 rows 
# by default tail displays last 5 rows
df.tail()

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
495,1496,Webcam,Apple,Peripherals,1679.25,364.0,USA,2021-08-05
496,1497,Mouse,Dell,Peripherals,1480.08,293.0,UK,2018-10-17
497,1498,Headphones,Canon,Peripherals,644.31,300.0,China,2018-11-08
498,1499,Headphones,Lenovo,Accessories,816.06,232.0,India,2021-10-26
499,1500,Laptop,Dell,Accessories,1162.04,338.0,USA,2021-01-31


### **21. Filtering** - Selecting rows and columns that meets condition

In [87]:
# filter the data get info of product name is webcam
df[df['Product_Name'] == 'Webcam'].head()

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
19,1020,Webcam,Dell,Electronics,1104.1,120.0,UK,2023-05-22
23,1024,Webcam,Apple,Accessories,549.76,256.0,USA,2018-11-17
35,1036,Webcam,Canon,Accessories,651.67,44.0,Japan,2018-11-21
59,1060,Webcam,Lenovo,Accessories,975.79,189.0,India,2018-04-23
78,1079,Webcam,Dell,Electronics,970.23,118.0,India,2022-10-18


In [88]:
# access the records with price > 1000
df[df['Price'] > 1000].head()

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
1,1002,Laptop,HP,Accessories,1644.89,203.0,UK,2023-10-15
4,1005,Keyboard,Canon,Electronics,1500.41,51.0,Japan,2019-11-27
5,1006,Router,Apple,Electronics,1575.61,256.0,India,2021-06-13
8,1009,Smartphone,Lenovo,Electronics,1548.6,238.0,China,2020-04-07
10,1011,Laptop,HP,Peripherals,1663.45,267.0,UK,2022-03-10


In [89]:
# access the records having category accessories and electronics
df[df['Category'].isin(['Accessories','Electronics'])].head()

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
0,1001,Router,Logitech,Accessories,574.15,188.0,Germany,2020-06-22
1,1002,Laptop,HP,Accessories,1644.89,203.0,UK,2023-10-15
4,1005,Keyboard,Canon,Electronics,1500.41,51.0,Japan,2019-11-27
5,1006,Router,Apple,Electronics,1575.61,256.0,India,2021-06-13
8,1009,Smartphone,Lenovo,Electronics,1548.6,238.0,China,2020-04-07


### **22. loc()** - Used to access rows and columns (Label - based - indexing)

In [91]:
# access product name, brand, category of all records
df.loc[:,['Product_Name','Brand','Category']].head()

Unnamed: 0,Product_Name,Brand,Category
0,Router,Logitech,Accessories
1,Laptop,HP,Accessories
2,Mouse,Logitech,Peripherals
3,Keyboard,Samsung,Peripherals
4,Keyboard,Canon,Electronics


In [92]:
# access first five records and their brand, category and price
df.loc[0:4,['Brand','Category','Price']]

Unnamed: 0,Brand,Category,Price
0,Logitech,Accessories,574.15
1,HP,Accessories,1644.89
2,Logitech,Peripherals,642.71
3,Samsung,Peripherals,888.96
4,Canon,Electronics,1500.41


In [93]:
# access 6 records with all columns
df.loc[0:5,:]

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
0,1001,Router,Logitech,Accessories,574.15,188.0,Germany,2020-06-22
1,1002,Laptop,HP,Accessories,1644.89,203.0,UK,2023-10-15
2,1003,Mouse,Logitech,Peripherals,642.71,325.0,India,2019-02-22
3,1004,Keyboard,Samsung,Peripherals,888.96,439.0,USA,2020-12-22
4,1005,Keyboard,Canon,Electronics,1500.41,51.0,Japan,2019-11-27
5,1006,Router,Apple,Electronics,1575.61,256.0,India,2021-06-13


### **23. iloc()** - Used to access rows and columns (position - based - indexing)

In [95]:
# access all records with all columns
df.iloc[:,:]

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
0,1001,Router,Logitech,Accessories,574.15,188.0,Germany,2020-06-22
1,1002,Laptop,HP,Accessories,1644.89,203.0,UK,2023-10-15
2,1003,Mouse,Logitech,Peripherals,642.71,325.0,India,2019-02-22
3,1004,Keyboard,Samsung,Peripherals,888.96,439.0,USA,2020-12-22
4,1005,Keyboard,Canon,Electronics,1500.41,51.0,Japan,2019-11-27
...,...,...,...,...,...,...,...,...
495,1496,Webcam,Apple,Peripherals,1679.25,364.0,USA,2021-08-05
496,1497,Mouse,Dell,Peripherals,1480.08,293.0,UK,2018-10-17
497,1498,Headphones,Canon,Peripherals,644.31,300.0,China,2018-11-08
498,1499,Headphones,Lenovo,Accessories,816.06,232.0,India,2021-10-26


In [96]:
# access 6 entries with first 4 columns
df.iloc[:5 , :4]

Unnamed: 0,ID,Product_Name,Brand,Category
0,1001,Router,Logitech,Accessories
1,1002,Laptop,HP,Accessories
2,1003,Mouse,Logitech,Peripherals
3,1004,Keyboard,Samsung,Peripherals
4,1005,Keyboard,Canon,Electronics


### **24. Add column**

In [98]:
# add column tax and add 2 % tax on price
df['Tax'] = df['Price'] * (5/100)

In [99]:
# display dataset
df.head()

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date,Tax
0,1001,Router,Logitech,Accessories,574.15,188.0,Germany,2020-06-22,28.7075
1,1002,Laptop,HP,Accessories,1644.89,203.0,UK,2023-10-15,82.2445
2,1003,Mouse,Logitech,Peripherals,642.71,325.0,India,2019-02-22,32.1355
3,1004,Keyboard,Samsung,Peripherals,888.96,439.0,USA,2020-12-22,44.448
4,1005,Keyboard,Canon,Electronics,1500.41,51.0,Japan,2019-11-27,75.0205


### **25. Drop Column**

In [101]:
# drop tax column
df.drop('Tax', axis = 1, inplace = True) 

In [102]:
# display dataset
df.head()

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date
0,1001,Router,Logitech,Accessories,574.15,188.0,Germany,2020-06-22
1,1002,Laptop,HP,Accessories,1644.89,203.0,UK,2023-10-15
2,1003,Mouse,Logitech,Peripherals,642.71,325.0,India,2019-02-22
3,1004,Keyboard,Samsung,Peripherals,888.96,439.0,USA,2020-12-22
4,1005,Keyboard,Canon,Electronics,1500.41,51.0,Japan,2019-11-27


### **26. rank()** - Rank values in a column. Assigning numerical position based on ranks

In [104]:
# rank entries as per price of products
df['Price'].rank(ascending = False).head()

0    370.0
1     95.0
2    349.0
3    286.0
4    140.0
Name: Price, dtype: float64

### **27. astype()** - Converts the datatype of column

In [106]:
# convert datatype of price column
df['Price'].astype(int).head()

0     574
1    1644
2     642
3     888
4    1500
Name: Price, dtype: int32

### **28. apply()** - Used to apply function to each row or column 

In [108]:
# make Is_Laptop column and if product is laptop print yes, no otherwise
df["Is_Laptop"] = df["Product_Name"].apply(lambda x : "Yes" if x == 'Laptop' else "no")

In [109]:
df.head()

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date,Is_Laptop
0,1001,Router,Logitech,Accessories,574.15,188.0,Germany,2020-06-22,no
1,1002,Laptop,HP,Accessories,1644.89,203.0,UK,2023-10-15,Yes
2,1003,Mouse,Logitech,Peripherals,642.71,325.0,India,2019-02-22,no
3,1004,Keyboard,Samsung,Peripherals,888.96,439.0,USA,2020-12-22,no
4,1005,Keyboard,Canon,Electronics,1500.41,51.0,Japan,2019-11-27,no


### **29. query()** - Uses sql like queries to filter data 

In [111]:
df.query("Product_Name == 'Laptop' and Price > 1200 ").head()

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date,Is_Laptop
1,1002,Laptop,HP,Accessories,1644.89,203.0,UK,2023-10-15,Yes
10,1011,Laptop,HP,Peripherals,1663.45,267.0,UK,2022-03-10,Yes
30,1031,Laptop,Samsung,Peripherals,1817.93,497.0,Germany,2021-11-03,Yes
41,1042,Laptop,Dell,Accessories,1256.04,268.0,India,2023-05-06,Yes
44,1045,Laptop,Apple,Electronics,1994.63,323.0,Germany,2023-12-30,Yes


### **30. nlargest** - Top n rows with highest values

In [113]:
df.nlargest(4,"Stock_Units")

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date,Is_Laptop
30,1031,Laptop,Samsung,Peripherals,1817.93,497.0,Germany,2021-11-03,Yes
367,1368,Monitor,Apple,Accessories,1177.95,497.0,Japan,2021-07-10,no
20,1021,Printer,Logitech,Electronics,272.0,496.0,India,2021-02-27,no
170,1171,Webcam,Lenovo,Accessories,1022.956646,496.0,USA,2020-06-22,no


### **31. nsmallest** - Top n rows with smallest values

In [115]:
df.nsmallest(6,"Price")

Unnamed: 0,ID,Product_Name,Brand,Category,Price,Stock_Units,Country_Of_Origin,Launch_Date,Is_Laptop
375,1376,Monitor,Dell,Electronics,52.68,437.0,Japan,2019-12-30,no
255,1256,Webcam,Asus,Accessories,57.36,330.0,UK,2021-07-17,no
398,1399,Laptop,Apple,Peripherals,60.53,231.0,India,2019-07-07,Yes
382,1383,Monitor,Asus,Peripherals,61.24,327.0,China,2021-05-26,no
39,1040,Headphones,Lenovo,Peripherals,63.92,263.0,China,2020-08-19,no
146,1147,Mouse,Samsung,Electronics,67.78,121.0,Japan,2021-01-26,no


### **32. group_by()** - Used to group data based on one or more columns and then apply a aggregate function 

In [117]:
# what is the average price for each category
df.groupby('Category')['Price'].mean()

Category
Accessories     999.261804
Electronics    1005.506518
Peripherals    1061.358952
Name: Price, dtype: float64

In [118]:
# what is the average price for each product
df.groupby('Product_Name')['Price'].mean()

Product_Name
Headphones    1034.316438
Keyboard      1034.766794
Laptop        1078.148709
Monitor        942.304965
Mouse          989.177345
Printer       1021.840000
Router         939.935833
Smartphone    1007.308303
Tablet        1067.435461
Webcam        1089.983817
Name: Price, dtype: float64

### **33. mean()** - Used to get average of datapoints

In [120]:
# get average of price column
df['Price'].mean()

1022.9566458333334

In [121]:
# get average of stock unit column
df['Stock_Units'].mean()

258.1166666666667

### **34. mode()** - Used to get most frequent value

In [123]:
# get most frequent value of category
df['Category'].mode()

0    Peripherals
Name: Category, dtype: object

## **Correlation** - Correlation means how two things change together.

- **Correlation is the relation between two or more variables.**
- **How one variable affects another.**
- **If temperature increases, and ice cream sales also increase, they have a positive correlation.**
- **If price increases, and demand decreases, they have a negative correlation.**

### **Types of correlations -**

- **Positive Correlation** – Both things increase or decrease together.

    Example: Height and weight.

- **Negative Correlation** – One increases, the other decreases.

    Example: Price and demand.

- **Zero Correlation** – No connection between the two.

    Example: Shoe size and intelligence.

### **35. corr()** - Used to get correlation between numerical columns

In [125]:
num_columns = ['Price','Stock_Units']

In [126]:
correlation = df[num_columns].corr()
print(correlation)

                Price  Stock_Units
Price        1.000000    -0.000777
Stock_Units -0.000777     1.000000


- **Value of correlation is near zero that indicates no correlation or zero correlation between Price and Stock_Units**