# Libraries

![Top-10-Python-Libraries-3](https://github.com/user-attachments/assets/37bac531-a84c-4c78-9956-43bafb4646f9)

Libraries in the context of programming and data science are collections of pre-written code that developers can use to optimize tasks and streamline the development process. They encapsulate common functions and utilities to handle specific tasks, such as:
- **Data Manipulation**: Performing operations on data structures (e.g., pandas).
- **Numerical Computation**: Handling mathematical operations and numerical data (e.g., NumPy, SciPy).
- **Machine Learning**: Building and training machine learning models (e.g., Scikit-learn, TensorFlow, PyTorch).
- **Data Visualization**: Creating graphs and plots to visualize data (e.g., Matplotlib, Seaborn, Plotly).
- **Natural Language Processing**: Working with text data for tasks like tokenization, parsing, and sentiment analysis (e.g., NLTK, spaCy).
- **Web Scraping**: Extracting data from web pages (e.g., BeautifulSoup, Scrapy).


# Working with Pandas
![download](https://github.com/user-attachments/assets/5f58b2a1-3bdd-41b9-ba7c-bc46a818d24b)

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. 

#### Key Features of Pandas
1. Data Structures:

    - Series: A one-dimensional labeled array capable of holding any data type.
    - DataFrame: A two-dimensional labeled data structure with columns of potentially different types.
2. Data Handling:

    - Importing and exporting data from various file formats (CSV, Excel, SQL databases, etc.).
    - Handling missing data (NaN).
    - Label-based slicing, indexing, and subsetting of large datasets.
    - Grouping data and performing aggregate operations.

In [2]:
# install pandas
!pip install pandas



In [3]:
import pandas

## Series

In [4]:
data = [10, 20, 30, 40, 50]

In [6]:
x = pandas.Series(data)

In [7]:
x

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [11]:
# alias of pandas
import pandas as pd

In [12]:
country = ['Nigeria', 'Ghana', 'South Africa']
country = pd.Series(country)

In [13]:
country

0         Nigeria
1           Ghana
2    South Africa
dtype: object

In [16]:
# custom index
marks = [67, 58, 93, 75]
subjects = ['Maths', 'English', 'Science', 'Hindi']

mark_series = pd.Series(marks, index=subjects)

In [17]:
mark_series

Maths      67
English    58
Science    93
Hindi      75
dtype: int64

## DataFrame

In [18]:
# create a dataframe from a dictionary
data = {
    'Name': ['Alice', 'Barnabas', 'Catherine', 'Dorris'],
    'Age': [28, 29, 35, 26],
    'City': ['New York', 'Los Angeles', 'Cairo', 'Paris']
}

In [19]:
data

{'Name': ['Alice', 'Barnabas', 'Catherine', 'Dorris'],
 'Age': [28, 29, 35, 26],
 'City': ['New York', 'Los Angeles', 'Cairo', 'Paris']}

In [20]:
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,28,New York
1,Barnabas,29,Los Angeles
2,Catherine,35,Cairo
3,Dorris,26,Paris


In [21]:
# Add a new column -- Gender
df['Gender'] = ['Female', 'Male', 'Female', 'Male']

In [22]:
df

Unnamed: 0,Name,Age,City,Gender
0,Alice,28,New York,Female
1,Barnabas,29,Los Angeles,Male
2,Catherine,35,Cairo,Female
3,Dorris,26,Paris,Male


## Data Handling 

#### Import data in pandas
Data are in different formats, such as:
- CSV
- TSV
- JSON
- SQL

In [25]:
# reading a csv file into a dataframe
df = pd.read_csv('locations.csv')

In [26]:
df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita
0,Afghanistan,Asia,3.892834e+07,64.83,0.500,1803.987
1,Albania,Europe,2.877800e+06,78.57,2.890,11803.431
2,Algeria,Africa,4.385104e+07,76.88,1.900,13913.839
3,Andorra,Europe,7.726500e+04,83.73,,
4,Angola,Africa,3.286627e+07,61.15,,5819.495
...,...,...,...,...,...,...
207,Yemen,Asia,2.982597e+07,66.12,0.700,1479.147
208,Zambia,Africa,1.838396e+07,63.89,2.000,3689.251
209,Zimbabwe,Africa,1.486293e+07,61.49,1.700,1899.775
210,World,,7.794799e+09,72.58,2.705,15469.207


In [None]:
# specify path -- "C:\Users\User\Documents\Python\Web Scraping on IMDB Movie\imdb_top_250_movies.csv"

In [28]:
movie_df = pd.read_csv("C:/Users/User/Documents/Python/Web Scraping on IMDB Movie/imdb_top_250_movies.csv")

In [29]:
movie_df

Unnamed: 0,Title,Release Date,Ratings,Ranking,Runtime,Popularity
0,The Shawshank Redemption,1994,9.3,1,2h 22m,2.9M
1,The Godfather,1972,9.2,2,2h 55m,2M
2,The Dark Knight,2008,9.0,3,2h 32m,2.9M
3,The Godfather Part II,1974,9.0,4,3h 22m,1.4M
4,12 Angry Men,1957,9.0,5,1h 36m,867K
...,...,...,...,...,...,...
245,It Happened One Night,1934,8.1,246,1h 45m,112K
246,Aladdin,1992,8.0,247,1h 30m,468K
247,Drishyam,2015,8.2,248,2h 43m,95K
248,Dances with Wolves,1990,8.0,249,3h 1m,291K


#### Get your file from the URL

In [30]:
url = "https://raw.githubusercontent.com/Oyeniran20/Machine-Learning/main/6.%20Trees/housing.csv"

In [31]:
url

'https://raw.githubusercontent.com/Oyeniran20/Machine-Learning/main/6.%20Trees/housing.csv'

In [34]:
data = pd.read_csv(url)

In [35]:
data

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY
...,...,...,...,...,...,...,...,...,...,...
20635,-121.09,39.48,25.0,1665.0,374.0,845.0,330.0,1.5603,78100.0,INLAND
20636,-121.21,39.49,18.0,697.0,150.0,356.0,114.0,2.5568,77100.0,INLAND
20637,-121.22,39.43,17.0,2254.0,485.0,1007.0,433.0,1.7000,92300.0,INLAND
20638,-121.32,39.43,18.0,1860.0,409.0,741.0,349.0,1.8672,84700.0,INLAND


#### Sep Parameter

In [38]:
col_name = ['sn', 'title', 'release_year', 'ratings', 'vote_count', 'genre']
df = pd.read_csv('movie_titles_metadata.tsv', sep='\t', names=col_name)
df

Unnamed: 0,sn,title,release_year,ratings,vote_count,genre
0,m0,10 things i hate about you,1999,6.9,62847.0,['comedy' 'romance']
1,m1,1492: conquest of paradise,1992,6.2,10421.0,['adventure' 'biography' 'drama' 'history']
2,m2,15 minutes,2001,6.1,25854.0,['action' 'crime' 'drama' 'thriller']
3,m3,2001: a space odyssey,1968,8.4,163227.0,['adventure' 'mystery' 'sci-fi']
4,m4,48 hrs.,1982,6.9,22289.0,['action' 'comedy' 'crime' 'drama' 'thriller']
...,...,...,...,...,...,...
612,m612,watchmen,2009,7.8,135229.0,['action' 'crime' 'fantasy' 'mystery' 'sci-fi'...
613,m613,xxx,2002,5.6,53505.0,['action' 'adventure' 'crime']
614,m614,x-men,2000,7.4,122149.0,['action' 'sci-fi']
615,m615,young frankenstein,1974,8.0,57618.0,['comedy' 'sci-fi']


In [65]:
df1 = pd.read_csv('test.csv', header=1)

In [76]:
pd.read_excel('exchange_rate.xlsx')

Unnamed: 0.1,Unnamed: 0,provider,WARNING_UPGRADE_TO_V6,terms,base,date,time_last_updated,rates
0,INR,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,INR,2024-12-06,1733443201,1.0000
1,AED,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,INR,2024-12-06,1733443201,0.0433
2,AFN,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,INR,2024-12-06,1733443201,0.8080
3,ALL,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,INR,2024-12-06,1733443201,1.1000
4,AMD,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,INR,2024-12-06,1733443201,4.7500
...,...,...,...,...,...,...,...,...
157,XPF,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,INR,2024-12-06,1733443201,1.3400
158,YER,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,INR,2024-12-06,1733443201,2.9500
159,ZAR,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,INR,2024-12-06,1733443201,0.2130
160,ZMW,https://www.exchangerate-api.com,https://www.exchangerate-api.com/docs/free,https://www.exchangerate-api.com/terms,INR,2024-12-06,1733443201,0.3200


In [42]:
pd.read_json('train.json')

Unnamed: 0,id,cuisine,ingredients
0,10259,greek,"[romaine lettuce, black olives, grape tomatoes..."
1,25693,southern_us,"[plain flour, ground pepper, salt, tomatoes, g..."
2,20130,filipino,"[eggs, pepper, salt, mayonaise, cooking oil, g..."
3,22213,indian,"[water, vegetable oil, wheat, salt]"
4,13162,indian,"[black pepper, shallots, cornflour, cayenne pe..."
...,...,...,...
39769,29109,irish,"[light brown sugar, granulated sugar, butter, ..."
39770,11462,italian,"[KRAFT Zesty Italian Dressing, purple onion, b..."
39771,2238,irish,"[eggs, citrus fruit, raisins, sourdough starte..."
39772,41882,chinese,"[boneless chicken skinless thigh, minced garli..."


### Important Functions and Attributes

In [50]:
# preview the first 5 rows
df.head()

Unnamed: 0,sn,title,release_year,ratings,vote_count,genre
0,m0,10 things i hate about you,1999,6.9,62847.0,['comedy' 'romance']
1,m1,1492: conquest of paradise,1992,6.2,10421.0,['adventure' 'biography' 'drama' 'history']
2,m2,15 minutes,2001,6.1,25854.0,['action' 'crime' 'drama' 'thriller']
3,m3,2001: a space odyssey,1968,8.4,163227.0,['adventure' 'mystery' 'sci-fi']
4,m4,48 hrs.,1982,6.9,22289.0,['action' 'comedy' 'crime' 'drama' 'thriller']


In [51]:
# last 5 rows
df.tail()

Unnamed: 0,sn,title,release_year,ratings,vote_count,genre
612,m612,watchmen,2009,7.8,135229.0,['action' 'crime' 'fantasy' 'mystery' 'sci-fi'...
613,m613,xxx,2002,5.6,53505.0,['action' 'adventure' 'crime']
614,m614,x-men,2000,7.4,122149.0,['action' 'sci-fi']
615,m615,young frankenstein,1974,8.0,57618.0,['comedy' 'sci-fi']
616,m616,zulu dawn,1979,6.4,1911.0,['action' 'adventure' 'drama' 'history' 'war']


In [59]:
df.sample(5)

Unnamed: 0,sn,title,release_year,ratings,vote_count,genre
186,m186,smokin' aces,2006,6.6,58048.0,['action' 'crime' 'drama' 'thriller']
326,m326,do the right thing,1989,7.9,27164.0,['drama']
483,m483,maniac,1980,6.2,3382.0,['drama' 'horror' 'thriller']
549,m549,the terminator,1984,8.1,183538.0,['action' 'sci-fi' 'thriller']
105,m105,jackie brown,1997,7.6,85496.0,['crime' 'drama' 'thriller']


In [60]:
df.shape 

(617, 6)

In [61]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 617 entries, 0 to 616
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sn            617 non-null    object 
 1   title         616 non-null    object 
 2   release_year  616 non-null    object 
 3   ratings       616 non-null    float64
 4   vote_count    616 non-null    float64
 5   genre         616 non-null    object 
dtypes: float64(2), object(4)
memory usage: 29.1+ KB


In [64]:
# summary statistics
df.describe()

Unnamed: 0,ratings,vote_count
count,616.0,616.0
mean,6.865584,49901.698052
std,1.215463,61898.367352
min,2.5,9.0
25%,6.2,9992.5
50%,7.0,27121.5
75%,7.8,66890.0
max,9.3,419312.0


In [69]:
# missing Value
data.isnull().sum()

longitude               0
latitude                0
housing_median_age      0
total_rooms             0
total_bedrooms        207
population              0
households              0
median_income           0
median_house_value      0
ocean_proximity         0
dtype: int64

In [70]:
df.isna().sum()

sn              0
title           1
release_year    1
ratings         1
vote_count      1
genre           1
dtype: int64

In [72]:
import numpy as np

In [73]:
# Sample DataFrame with missing values
data = {
    'A': [1, 2, 3, 4], 
    'B': [5, np.nan, np.nan, 8], 
    'C': [10, 11, 12, np.nan]
}
df = pd.DataFrame(data)

In [74]:
df

Unnamed: 0,A,B,C
0,1,5.0,10.0
1,2,,11.0
2,3,,12.0
3,4,8.0,


In [75]:
df.isna().sum()

A    0
B    2
C    1
dtype: int64