**ID 10360**

```As a data scientist at Amazon Prime Video, you are tasked with enhancing the in-flight entertainment experience for Amazon’s airline partners. Your challenge is to develop a feature that suggests individual movies from Amazon's content database that fit within a given flight's duration. For flight 101, find movies whose runtime is less than or equal to the flight's duration. The output should list suggested movies for the flight, including 'flight_id', 'movie_id', and 'movie_duration'."```

In [None]:
%%sql
select 
    fs.flight_id,
    ec.movie_id,
    ec.duration as movie_duration
from flight_schedule as fs JOIN entertainment_catalog as ec ON 
    ec.duration <= fs.flight_duration 
where 
    fs.flight_id = 101
ORDER BY ec.duration ASC

In [None]:
df = pd.merge(entertainment_catalog, flight_schedule.query('flight_id == 101'), how='cross').query('duration <= flight_duration')[['flight_id', 'movie_id', 'duration']]

**ID 10362**

```You are provided with a transactional dataset from Amazon that contains detailed information about sales across different products and marketplaces. Your task is to list the top 3 sellers in each product category for January. The output should contain 'seller_id' , 'total_sales' ,'product_category' , 'market_place', and 'month'.```

In [None]:
%%sql
WITH ranked AS (SELECT *,
                    DENSE_RANK()
                    OVER (PARTITION BY product_category ORDER BY total_sales DESC) AS rnk
             FROM sales_data
             WHERE month = '2024-01')
SELECT seller_id,
       total_sales,
       product_category,
       market_place,
       month
FROM ranked
WHERE rnk <= 3

In [None]:
df = sales_data
df['str_month'] = df['month'].dt.strftime('%Y-%m')
filtered_df = df.query('str_month == "2024-01"')

filtered_df['rnk'] = filtered_df.groupby('product_category')['total_sales'].rank(method='dense',ascending=False)

filtered_df[filtered_df['rnk'].isin([1, 2, 3])].loc[:, ~filtered_df.columns.isin(['rnk', 'str_month'])].sort_values(['product_category','total_sales'], ascending =(True,False))

**ID 10363**

```For each week, find the total number of orders. Include only the orders that are from the first quarter of 2023. The output should contain 'week' and 'quantity'.```

In [None]:
%%sql
SELECT week,
       quantity
FROM orders_analysis
WHERE EXTRACT(QUARTER FROM week) = 1
  AND EXTRACT(YEAR FROM week) = 2023
ORDER BY EXTRACT(WEEK FROM week) ASC

In [None]:
df = orders_analysis
df.query('week.dt.quarter == 1 & week.dt.year == 2023').sort_values('week')[['week', 'quantity']]

**ID 10364**

```You have access to Facebook's database which includes several tables relevant to user interactions. For this task, you are particularly interested in tables that store data about user posts, friendships, and likes. Calculate the total number of likes made on friend posts on Friday. The output should contain two different columns 'likes' and 'date'.```

In [None]:
%%sql
WITH filtered_likes AS (SELECT post_id,
                               user_name,
                               date_liked
                        FROM likes
                        WHERE DATE_PART('dow', date_liked) = 5),
     result_table AS (SELECT up.post_id,
                      up.user_name AS host_name,
                      fl.user_name AS guest_name,
                      up.date_posted,
                      fl.date_liked
               FROM user_posts AS up
                        JOIN filtered_likes AS fl ON up.post_id = fl.post_id
               WHERE up.user_name || ' ' || fl.user_name IN
                     (SELECT f.user_name1 || ' ' || f.user_name2
                      FROM friendships AS f)
                  OR up.user_name || ' ' || fl.user_name IN
                     (SELECT f.user_name2 || ' ' || f.user_name1
                      FROM friendships AS f))
SELECT date_liked, 
       COUNT(*) AS likes
FROM result_table
GROUP BY date_liked

**ID 10365**

```You are analyzing a social network dataset at Google. Your task is to find mutual friends between two users, Karl and Hans. There is only one user named Karl and one named Hans in the dataset. The output should contain 'user_id' and 'user_name' columns.```

In [None]:
%%sql
WITH cte AS (SELECT *
             FROM (SELECT friend_id
                   FROM friends
                   WHERE user_id IN (SELECT user_id
                                     FROM users
                                     WHERE user_name = 'Karl')) AS t1
             INTERSECT
             SELECT *
             FROM (SELECT friend_id
                   FROM friends
                   WHERE user_id IN (SELECT user_id
                                     FROM users
                                     WHERE user_name = 'Hans')) AS t2)
SELECT user_id, user_name
FROM users
WHERE user_id IN (SELECT * FROM cte)

**ID 10366**

```Capital One's marketing team is working on a project to analyze customer feedback from their feedback surveys. The team sorted the words from the feedback into three different categories: short_comments, mid_length_comments, long_comments. The team wants to find comments that are not short and that come from social media. The output should include 'feedback_id,' 'feedback_text,' 'source_channel,' and a calculated category```

In [None]:
%%sql
SELECT DISTINCT feedback_id,
                feedback_text,
                source_channel,
                comment_category
FROM customer_feedback
WHERE comment_category != 'short_comments'
  AND source_channel IN ('social_media')

In [None]:
df = customer_feedback

df.query('source_channel.isin(["social_media"]) & comment_category != "short_comments"').drop_duplicates()

**ID 10367**

```You're tasked with analyzing a Spotify-like dataset that captures user listening habits. For each user, calculate the total listening time and the count of unique songs they've listened to. In the database duration values are displayed in seconds. Round the total listening duration to the nearest whole minute. The output should contain three columns: 'user_id', 'total_listen_duration', and 'unique_song_count'.```

In [None]:
%%sql
SELECT user_id,
       ROUND(SUM(listen_duration) / 60) AS total_listen_duration,
       COUNT(DISTINCT song_id)          AS unique_song_count
FROM listening_habits
GROUP BY user_id

In [None]:
df = listening_habits
result = df.groupby('user_id', as_index=False).agg(total_listen_duration=('listen_duration', 'sum'), unique_song_count = ('song_id', 'nunique'))
result['total_listen_duration'] = result['total_listen_duration'].apply(lambda x: round(x / 60))
result

**ID 10368**

```You are working on a data analysis project at Deloitte where you need to analyze a dataset containing information about various cities. Your task is to calculate the population density of these cities, rounded to the nearest integer, and identify the cities with the minimum and maximum densities. The population density should be calculated as (Population / Area). The output should contain 'city', 'country', 'density'.```

In [None]:
%%sql
WITH cte AS (SELECT city,
                    country,
                    population / NULLIF(area, 0)                               AS density,
                    RANK() OVER (ORDER BY (population / NULLIF(area, 0)))      AS low_rnk,
                    RANK() OVER (ORDER BY (population / NULLIF(area, 0)) DESC) AS high_rnk
             FROM cities_population
             WHERE area > 0)
SELECT city,
       country,
       density
FROM cte
WHERE low_rnk = 1
   OR high_rnk = 1

In [None]:
df = cities_population
df = df.query('area > 0')

df['density'] = df['population'] / df['area']
df['low_rank'] = df['density'].rank(method='min')
df['high_rank'] = df['density'].rank(method='min', ascending=False)
df.query('low_rank == 1 | high_rank == 1')[['city', 'country', 'density']]