# Flight Price & Google Playstore Data Analysis

## Q1. Load the flight price dataset and examine its dimensions. How many rows and columns does the dataset have?

In [None]:
import pandas as pd

# Load the flight price dataset (update path as needed)
flight = pd.read_csv('flight_price.csv')
print(f"Rows: {flight.shape[0]}, Columns: {flight.shape[1]}")

## Q2. What is the distribution of flight prices in the dataset? Create a histogram to visualize the distribution.

In [None]:
import matplotlib.pyplot as plt

plt.hist(flight['Price'], bins=30, color='skyblue', edgecolor='black')
plt.xlabel('Flight Price')
plt.ylabel('Frequency')
plt.title('Distribution of Flight Prices')
plt.show()

## Q3. What is the range of prices in the dataset? What is the minimum and maximum price?

In [None]:
min_price = flight['Price'].min()
max_price = flight['Price'].max()
print(f"Min price: {min_price}, Max price: {max_price}")

## Q4. How does the price of flights vary by airline? Create a boxplot to compare the prices of different airlines.

In [None]:
import seaborn as sns

plt.figure(figsize=(12,6))
sns.boxplot(x='Airline', y='Price', data=flight)
plt.xticks(rotation=45)
plt.title('Flight Prices by Airline')
plt.show()

## Q5. Are there any outliers in the dataset? Identify any potential outliers using a boxplot and describe how they may impact your analysis.

In [None]:
plt.figure(figsize=(8,4))
sns.boxplot(x=flight['Price'])
plt.title('Boxplot of Flight Prices')
plt.show()

# Outliers are points outside the whiskers. They may skew the mean and affect model performance.

## Q6. You are working for a travel agency, and your boss has asked you to analyze the Flight Price dataset to identify the peak travel season. What features would you analyze to identify the peak season, and how would you present your findings to your boss?

**Answer:**

Analyze features like date of journey, month, and number of bookings. Group by month or season and plot the number of flights or average price to identify peaks. Present findings using line or bar charts.

## Q7. You are a data analyst for a flight booking website, and you have been asked to analyze the Flight Price dataset to identify any trends in flight prices. What features would you analyze to identify these trends, and what visualizations would you use to present your findings to your team?

**Answer:**

Analyze features like date, airline, source, destination, and duration. Use line plots for price over time, boxplots for price by airline, and heatmaps for price by route.

## Q8. You are a data scientist working for an airline company, and you have been asked to analyze the Flight Price dataset to identify the factors that affect flight prices. What features would you analyze to identify these factors, and how would you present your findings to the management team?

**Answer:**

Analyze features such as airline, source, destination, duration, stops, and time of booking. Use correlation analysis, regression, and feature importance plots to present findings.

## Q9. Load the Google Playstore dataset and examine its dimensions. How many rows and columns does the dataset have?

In [None]:
playstore = pd.read_csv('googleplaystore.csv')
print(f"Rows: {playstore.shape[0]}, Columns: {playstore.shape[1]}")

## Q10. How does the rating of apps vary by category? Create a boxplot to compare the ratings of different app categories.

In [None]:
plt.figure(figsize=(14,6))
sns.boxplot(x='Category', y='Rating', data=playstore)
plt.xticks(rotation=90)
plt.title('App Ratings by Category')
plt.show()

## Q11. Are there any missing values in the dataset? Identify any missing values and describe how they may impact your analysis.

In [None]:
missing = playstore.isnull().sum()
print(missing[missing > 0])
# Missing values can bias results or reduce data size if rows are dropped.

## Q12. What is the relationship between the size of an app and its rating? Create a scatter plot to visualize the relationship.

In [None]:
plt.scatter(playstore['Size'], playstore['Rating'], alpha=0.5)
plt.xlabel('App Size')
plt.ylabel('Rating')
plt.title('App Size vs. Rating')
plt.show()

## Q13. How does the type of app affect its price? Create a bar chart to compare average prices by app type.

In [None]:
app_type_price = playstore.groupby('Type')['Price'].mean()
app_type_price.plot(kind='bar', color='orange')
plt.ylabel('Average Price')
plt.title('Average App Price by Type')
plt.show()

## Q14. What are the top 10 most popular apps in the dataset? Create a frequency table to identify the apps with the highest number of installs.

In [None]:
top_apps = playstore.groupby('App')['Installs'].sum().sort_values(ascending=False).head(10)
print(top_apps)

## Q15. A company wants to launch a new app on the Google Playstore and has asked you to analyze the Google Playstore dataset to identify the most popular app categories. How would you approach this task, and what features would you analyze to make recommendations to the company?

**Answer:**

Analyze the 'Category' and 'Installs' features. Group by category and sum installs to find the most popular categories. Recommend categories with the highest total installs.

## Q16. A mobile app development company wants to analyze the Google Playstore dataset to identify the most successful app developers. What features would you analyze to make recommendations to the company, and what data visualizations would you use to present your findings?

**Answer:**

Analyze 'App', 'Developer', 'Installs', and 'Rating'. Group by developer, sum installs, and average ratings. Use bar charts to show top developers by installs and ratings.

## Q17. A marketing research firm wants to analyze the Google Playstore dataset to identify the best time to launch a new app. What features would you analyze to make recommendations to the company, and what data visualizations would you use to present your findings?

**Answer:**

Analyze 'Last Updated' or release date, installs, and ratings. Group by month or season to find trends. Use line plots or bar charts to present the best launch periods.