#  EDA Assignment - 2


### Flight Price Dataset:

#### Q1. Load the flight price dataset and examine its dimensions.
```python
import pandas as pd

# Load the dataset
flight_price_df = pd.read_csv('flight_price_dataset.csv')

# Examine dimensions
rows, columns = flight_price_df.shape
print(f"Number of rows: {rows}, Number of columns: {columns}")
```

#### Q2-Q8. Analyzing Flight Price Dataset:
For questions related to distribution, range, airline comparison, outliers, peak travel season, trends, and factors affecting flight prices, you would perform exploratory data analysis (EDA) using various visualizations (histograms, boxplots, scatter plots) and descriptive statistics.

### Google Playstore Dataset:

#### Q9. Load the Google Playstore dataset and examine its dimensions.
```python
# Load the dataset
playstore_df = pd.read_csv('google_playstore_dataset.csv')

# Examine dimensions
rows_playstore, columns_playstore = playstore_df.shape
print(f"Number of rows: {rows_playstore}, Number of columns: {columns_playstore}")
```

#### Q10-Q17. Analyzing Google Playstore Dataset:
Similar to the Flight Price dataset, you would use EDA techniques such as boxplots, scatter plots, bar charts, and frequency tables to answer questions related to category-wise ratings, missing values, app size and rating relationship, app type and price, top apps, successful app developers, and optimal app launch time.

### General Approach for Analysis:
1. **Data Cleaning:** Handle missing values, outliers, and format issues.
2. **Exploratory Data Analysis (EDA):** Visualize distributions, relationships, and trends.
3. **Descriptive Statistics:** Calculate summary statistics for numerical features.
4. **Data Visualization:** Use appropriate plots and charts for each analysis.
5. **In-depth Analysis:** Interpret findings, identify patterns, and derive insights.





### Flight Price Dataset (Questions 2-8):

#### Q2. Distribution of Flight Prices:
```python
import matplotlib.pyplot as plt

# Visualization of flight prices distribution
plt.hist(flight_price_df['Price'], bins=20, color='blue', alpha=0.7)
plt.title('Distribution of Flight Prices')
plt.xlabel('Flight Price')
plt.ylabel('Frequency')
plt.show()
```

#### Q3. Range of Prices:
```python
# Range of flight prices
min_price = flight_price_df['Price'].min()
max_price = flight_price_df['Price'].max()

print(f"Minimum Price: {min_price}, Maximum Price: {max_price}")
```

#### Q4. Flight Prices by Airline (Boxplot):
```python
import seaborn as sns

# Boxplot to compare prices by airline
plt.figure(figsize=(12, 6))
sns.boxplot(x='Airline', y='Price', data=flight_price_df)
plt.title('Flight Prices by Airline')
plt.xlabel('Airline')
plt.ylabel('Flight Price')
plt.xticks(rotation=45)
plt.show()
```

#### Q5. Identify Outliers (Boxplot):
```python
# Boxplot to identify outliers
plt.figure(figsize=(8, 6))
sns.boxplot(x=flight_price_df['Price'])
plt.title('Boxplot of Flight Prices')
plt.xlabel('Flight Price')
plt.show()
```

#### Q6-Q8. Analyzing Seasonal Trends:
*Assuming there's a 'Date' column indicating the date of the flight.*
```python
# Convert 'Date' column to datetime
flight_price_df['Date'] = pd.to_datetime(flight_price_df['Date'])

# Extract month from the date
flight_price_df['Month'] = flight_price_df['Date'].dt.month

# Analyzing seasonal trends
plt.figure(figsize=(12, 6))
sns.lineplot(x='Month', y='Price', data=flight_price_df, ci=None)
plt.title('Seasonal Trends in Flight Prices')
plt.xlabel('Month')
plt.ylabel('Flight Price')
plt.show()
```

### Google Playstore Dataset (Questions 10-17):

#### Q10. Ratings by Category (Boxplot):
```python
# Boxplot to compare ratings by category
plt.figure(figsize=(12, 6))
sns.boxplot(x='Category', y='Rating', data=playstore_df)
plt.title('App Ratings by Category')
plt.xlabel('App Category')
plt.ylabel('Rating')
plt.xticks(rotation=45)
plt.show()
```

#### Q11. Identify Missing Values:
```python
# Check for missing values
missing_values = playstore_df.isnull().sum()
print("Missing Values:\n", missing_values)
```

#### Q12. App Size vs. Rating (Scatter Plot):
```python
# Scatter plot for app size vs rating
plt.figure(figsize=(10, 6))
plt.scatter(playstore_df['Size'], playstore_df['Rating'], alpha=0.5)
plt.title('App Size vs Rating')
plt.xlabel('App Size')
plt.ylabel('Rating')
plt.show()
```

#### Q13. App Type vs. Price (Bar Chart):
```python
# Bar chart for app type vs price
avg_price_by_type = playstore_df.groupby('Type')['Price'].mean()
avg_price_by_type.plot(kind='bar', color='green', alpha=0.7)
plt.title('Average App Price by Type')
plt.xlabel('App Type')
plt.ylabel('Average Price')




#### Q14. Top 10 Most Popular Apps (Frequency Table):
```python
# Frequency table for top 10 most installed apps
top_apps = playstore_df.nlargest(10, 'Installs')[['App', 'Installs']]
print("Top 10 Most Popular Apps:\n", top_apps)
```

#### Q15. Identify Popular App Categories:
```python
# Bar chart for app categories and installs
category_installs = playstore_df.groupby('Category')['Installs'].sum().sort_values(ascending=False)
category_installs.plot(kind='bar', color='orange', alpha=0.7)
plt.title('Installs by App Category')
plt.xlabel('App Category')
plt.ylabel('Total Installs')
```

#### Q16. Identify Successful App Developers:
```python
# Bar chart for top app developers
top_developers = playstore_df.groupby('Developer')['Installs'].sum().nlargest(10)
top_developers.plot(kind='bar', color='purple', alpha=0.7)
plt.title('Top 10 Successful App Developers')
plt.xlabel('App Developer')
plt.ylabel('Total Installs')
```

#### Q17. Identify Best Time to Launch a New App:
*Assuming there's a 'Last Updated' column indicating the last update date.*
```python
# Convert 'Last Updated' column to datetime
playstore_df['Last Updated'] = pd.to_datetime(playstore_df['Last Updated'])

# Extract year from the last update date
playstore_df['Year'] = playstore_df['Last Updated'].dt.year

# Line chart for app updates over the years
plt.figure(figsize=(12, 6))
sns.lineplot(x='Year', y='Installs', data=playstore_df, estimator='sum', ci=None)
plt.title('App Installs Over the Years')
plt.xlabel('Year')
plt.ylabel('Total Installs')
plt.show()
```

