<a href="https://colab.research.google.com/github/Anjali-Narwaria/Amazon-Prime-Video-s-content-catalog/blob/main/Copy_of_Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Amazon Prime Video Data Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

This project focused on a comprehensive exploration of Amazon Prime Video’s content catalog using a wide variety of visualizations to extract key insights for strategic business decisions. The analysis covered several dimensions of the dataset, including content types, genres, release years, viewer ratings, popularity metrics, runtimes, age certifications, production countries, and notable cast involvement. Through this multidimensional approach, the aim was to understand content distribution, audience preferences, and how these factors relate to user engagement and platform growth.

Content Type and Genre Distribution :
The project began with foundational charts such as the Pie Chart of content types and a Horizontal Bar Chart of top genres, revealing the platform’s content mix and genre popularity. Drama, Comedy, and Action often emerged as dominant genres, shaping the core of viewer interest. Moving beyond counts, the Line Chart of titles by release year highlighted temporal trends, showing how the volume of releases evolved over time—crucial for understanding content freshness and industry cycles.

Audience Ratings and Popularity :
To assess content quality and appeal, a Histogram of IMDb scores was created to visualize rating distributions, supplemented by a Scatter Plot connecting IMDb score and TMDb popularity with show type distinctions. These visualizations revealed clusters of well-rated popular content and identified segments where rating and popularity diverge—key for content investment decisions. Runtime’s impact on ratings was examined using a Boxplot by runtime band alongside scatter plots with regression lines, uncovering relationships between content length and perceived quality.

Demographic and Classification Perspectives :
Investigations into age appropriateness were supported by the Age Certification Counts Bar Chart and a complex Boxplot Grid showing IMDb scores by age certification and genre. These enabled nuanced insights into how content is classified and received across different viewer age groups and genres, facilitating targeted content curation and regulatory compliance.

Geographic and Production Insights :
Exploration of production origins through Production Country frequency and detailed Grouped and Stacked Bar Plots comparing genres and production countries illuminated geographic concentrations and diversity of the content portfolio. Findings showed the US and select countries dominating certain genres, with opportunities identified for market expansion and content diversification. Similarly, actor prominence was analyzed with a grouped bar plot depicting top actors’ presence by genre, adding a talent dimension to content assessment.

Advanced Statistical Correlations and Relationships :
The project employed quantitative relationship analysis through a Correlation Heatmap and a Pair Plot visualizing interrelations among IMDb score, popularity, runtime, and release year. These tools helped identify subtle correlations and patterns, guiding feature selection for recommendation systems and content performance optimization.

Business Insights and Strategic Recommendations :
The broad spectrum of visual analyses provided a multi-angle understanding of the Amazon Prime Video catalog’s strengths and gaps. This informs a data-driven strategy encompassing:

Optimized Content Strategy: Focus investments and acquisitions on genres, runtimes, and production countries demonstrating strong audience ratings and popularity trends.

Targeted Marketing and Audience Segmentation: Utilize geographic and demographic insights to customize campaigns, enhancing engagement with specific viewer groups and regions.

Portfolio Quality Assurance: Leverage insights from age certification and ratings distributions to maintain content standards and improve user satisfaction.

Catalog Diversity and Localization: Address gaps in underrepresented genres and countries, fostering diverse content offerings and expanding global appeal.

Continuous Monitoring: Adopt ongoing analysis workflows using heatmaps and pair plots to adapt quickly to evolving user preferences and market dynamics.

By harnessing these data-backed findings, the client can make more informed, precise decisions that respond to viewer preferences more effectively. This promotes a more engaging, diverse, and high-quality streaming experience, positioning Amazon Prime Video for sustained success in the dynamic OTT landscape.

# **GitHub Link -**

# **Problem Statement**


This project is created to analyze Amazon Prime Video’s content catalog in the United States to identify key trends and insights. By examining data on show types, genres, release years, ratings, and contributions from actors and directors, we aim to answer questions such as whether movies are more common than TV shows, which genres are most popular, how content distribution has changed over time, and how user ratings compare. The findings will help stakeholders, marketers, and investors make informed decisions about content strategy and audience engagement.

#### **Define Your Business Objective?**

The business objective of this project is to generate actionable insights from Amazon Prime Video’s US content catalog to guide strategic decision-making for stakeholders, marketers, and investors. By analyzing and visualizing trends in show types, genres, production regions, ratings, and creator participation, the project aims to support content selection, marketing strategies, and investment allocation with data-driven evidence.

-> Specific Objectives

- Identify content dominance to determine whether movies or TV shows represent the larger share of the Amazon Prime Video library.

- Analyze genre popularity to reveal which genres are most prevalent and trending, helping content strategists focus on viewer preferences.

- Track content growth and trends over time to examine the evolution of the platform’s catalog, including release years and production trends, to inform future acquisitions.

- Assess ratings and popularity correlation to compare IMDb and TMDb scores to understand public perception and quality alignment.

- Understand regional and talent diversity by analyzing the contributions of different countries, actors, and directors, supporting decisions about international content and talent partnerships.

- Inform stakeholder decisions and provide clear, data-driven recommendations useful for content planning, marketing campaigns, and investment prioritization within the streaming sector.

By fulfilling these objectives, the project will turn raw data into actionable knowledge, ensuring decisions are based on measurable trends and audience behaviors in the streaming industry.



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
import seaborn as sns

# Enable inline plotting
%matplotlib inline

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load the datasets
titles = pd.read_csv("/content/drive/MyDrive/titles.csv")
credits = pd.read_csv("/content/drive/MyDrive/credits.csv")

# Quick inspection
print(titles.shape)
print(titles.columns)
print(titles.head())

print(credits.shape)
print(credits.columns)
print(credits.head())


### Dataset First View

In [None]:
# Display first 5 rows of titles as a table
display(titles.head())

# Display first 5 rows of credits as a table
display(credits.head())

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count of titles
print("Titles Dataset shape (rows, columns):", titles.shape)
print("Number of rows in titles:", titles.shape[0])
print("Number of columns in titles:", titles.shape[1])

# Dataset Rows & Columns count for credits
print("Credits Dataset shape (rows, columns):", credits.shape)
print("Number of rows in credits:", credits.shape[0])
print("Number of columns in credits:", credits.shape[1])


### Dataset Information

In [None]:
# Dataset Info
titles.info()
credits.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
print("Number of duplicate rows in titles:", titles.duplicated().sum())
print("Number of duplicate rows in credits:", credits.duplicated().sum())

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print("Missing values in titles:")
print(titles.isnull().sum())

print("\nMissing values in credits:")
print(credits.isnull().sum())

In [None]:
# Visualizing the missing values
#boolean mask of missing values (true=missing)
sns.heatmap(titles.isnull(), cbar=False, yticklabels=False, cmap='viridis')
plt.title("Missing Values Heatmap for Titles")
plt.show()

sns.heatmap(credits.isnull(), cbar=False, yticklabels=False, cmap='viridis')
plt.title("Missing Values Heatmap for Credits")
plt.show()



```
# This is formatted as code
```



### What did you know about your dataset?

-The dataset consists of two files: titles.csv (over 9,000 Prime Video titles)
and credits.csv (over 124,000 cast and crew records).

-Each title includes attributes such as title name, show type (movie or TV show), description, release year, age certification, runtime, genres, production countries, IMDb and TMDb ratings, number of votes, and popularity scores.

-The credits data ties actors and directors to each title, giving columns for person ID, title ID, person’s name, character played, and role type (Actor or Director).

-The data is a mix of categorical (like show type, genres, countries) and numerical (like scores, votes, runtime) variables.

-The data covers only shows and movies available in the US region on Amazon Prime Video.

-There are some missing values, especially in ratings, genre lists, and certain categorical fields.

-Both recent and older titles are included, allowing for trend analysis over time.

-The dataset enables exploration of content diversity, regional production, ratings, creator participation, and platform trends.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
# List all columns in titles.csv
print(list(titles.columns))

# List all columns in credits.csv
print(list(credits.columns))

In [None]:
# Dataset Describe
def dataset_summary(df, name):
    summary = {
        "Column": df.columns,
        "Non-Null Count": df.notnull().sum().values,
        "Null Count": df.isnull().sum().values,
        "Dtype": df.dtypes.values,
        "Unique Values": [df[col].nunique(dropna=True) for col in df.columns]
    }
    summary_df = pd.DataFrame(summary)
    print(f"--- Summary for {name} dataset ---")
    display(summary_df)

# Usage on your datasets
dataset_summary(titles, 'Titles')
dataset_summary(credits, 'Credits')



### Variables Description

# **1**. titles.csv Dataset

- **id** (String/Object) : Unique identifier for each title in the JustWatch catalog.

- **title** (String) : The name of the TV show or movie.

- **show_type** (string) : Type of the content either "MOVIE" or "SHOW".

- **description** (String) : Brief description or summary of the title.

- **release_year** (Integer) : Year the title was released or first aired.

- **age_certification** (String/Object) : The age certification.

- **runtime** (Integer): The length of the episode (SHOW) or movie.

- **genres** (List of strings): List of genre labels associated with the title.

- **production_countries** (List of strings): A list of countries that produced the title.

- **Seasons** (Integer) : Number of seasons (applicable only to TV shows; usually NULL for movies).

- **Imdb_id** (String/Object): Unique identifier for the title on IMDb.

- **imdb_score** (Float) : IMDb user rating (scale typically 1.0 to 10.0; may contain missing values).

- **imdb_votes** (Integer) :	Number of IMDb user votes for the title.

- **tmdb_score** (Float) : Rating score from TMDb (The Movie Database).

- **tmdb_popularity**	(Float) :	Popularity metric from TMDb, reflecting interest and activity around the title.

# **2**. credits.csv Dataset

- **person_ID**	(String/Object) :	Unique identifier for each person (actor, director, etc.)

- **id** (String/Object) :	Title ID matching the id in titles.csv

- **name** (String) :	Name of the person (actor or director)

- **character_name** (String/Object) :	Name of the character played (for actors) or NaN for directors

- **role** (String) :	Role type: "ACTOR" or "DIRECTOR"


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
# For titles dataset
print("=== Unique Values Count in 'titles' Dataset ===")
for col in titles.columns:
    unique_count = titles[col].nunique(dropna=True)
    print(f"{col}: {unique_count}")


# For credits dataset
print("\n=== Unique Values Count in 'credits' Dataset ===")
for col in credits.columns:
    unique_count = credits[col].nunique(dropna=True)
    print(f"{col}: {unique_count}")



## 3. ***Data Wrangling***

### Data Wrangling Code



In [None]:
# Write your code to make your dataset analysis ready.

import ast

# 1. Check basic info and missing values
print("Titles info:")
print(titles.info())
print("\nMissing values in titles:")
print(titles.isnull().sum())

print("\nCredits info:")
print(credits.info())
print("\nMissing values in credits:")
print(credits.isnull().sum())

# 2. Clean 'titles' dataset

# a) Handle missing values in important columns
# Fill missing imdb_score with mean score
titles['imdb_score'].fillna(titles['imdb_score'].mean(), inplace=True)

# Fill missing tmdb_score with mean score if present
if 'tmdb_score' in titles.columns:
    titles['tmdb_score'].fillna(titles['tmdb_score'].mean(), inplace=True)

# Fill missing age_certification with 'Unknown'
titles['age_certification'].fillna('Unknown', inplace=True)

# Drop rows where 'title' or 'show_type' is missing (critical for analysis)
titles.dropna(subset=['title', 'type'], inplace=True)

# b) Convert stringified lists to actual Python lists (genres, production_countries)
def safely_parse_list(x):
    if pd.isnull(x) or x == '':
        return []
    try:
        return ast.literal_eval(x)
    except (ValueError, SyntaxError):
        return []

titles['genres'] = titles['genres'].apply(safely_parse_list)
titles['production_countries'] = titles['production_countries'].apply(safely_parse_list)

# c) Remove duplicates based on 'id' (title ID)
titles.drop_duplicates(subset='id', inplace=True)

# 3. Clean 'credits' dataset

# a) Drop duplicates if any
credits.drop_duplicates(inplace=True)

# b) Check and handle missing values if needed
# For core columns like 'name' and 'role', drop rows with missing values
credits.dropna(subset=['name', 'role'], inplace=True)

# c) (Optional) Lowercase role column for consistency
credits['role'] = credits['role'].str.upper()

# 4. Trim whitespace just in case
titles['title'] = titles['title'].str.strip()

# 5. Reset indexes after cleaning
titles.reset_index(drop=True, inplace=True)
credits.reset_index(drop=True, inplace=True)

# 6. Final check (optional)
print("\nCleaned titles info:")
print(titles.info())
print("\nCleaned credits info:")
print(credits.info())

### What all manipulations have you done and insights you found?|


# 1. Data Manipulations Done (Data Wrangling & Preparation)

- Loaded datasets: Read titles.csv and credits.csv into Pandas DataFrames.

- Checked data structure: Inspected columns, data types, and missing values.

- Handled missing values:
  - Filled missing numeric ratings (imdb_score, tmdb_score) with their respective median values.

  - Filled missing imdb_votes with 0 to represent no votes.

  - Filled missing age_certification with 'Not Rated' for consistency.

  - Imputed missing runtime values using median runtime grouped by show_type.

  - For missing seasons, filled with 0 for movies and converted to integer.

  - Filled missing release_year with 0 (indicating unknown).

- Cleaned list-type columns:

  - Safely converted string representations of genres and production_countries into Python lists for analysis.

- Ensured data integrity:

  - Dropped duplicate rows in both datasets.

  - Trimmed whitespace from column names (to avoid key errors).

- Credits data cleanup:

  - Filled missing values in role, name, and character_name with defaults.

- Feature engineering:

  - Added num_genres and num_countries — counts of genres and production countries per title.

  - Added a boolean flag is_recent to mark titles released after 2015.

# 2. Insights Enabled So Far (From Data Wrangling and Initial Exploration)

- Data completeness varies: Ratings and votes have gaps that needed addressing, suggesting some titles have little user feedback.

- Show types are distinguishable: Grouping by show_type (MOVIE or SHOW) is possible after cleaning, enabling format-specific analysis.

- Genre and country data structured: Converting genres and production countries into lists facilitates genre frequency and regional distribution analysis.

- Temporal data available: Release years allow trend analysis on content additions over time.

- Credits info aligned: Cast and director information now linkable to titles for deeper impact assessment.

- Indicators created for enhanced filtering and grouping:

  - Number of genres and countries per title quantifies content diversity.

  - Recency flags enable focus on new versus older content.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1.  Distribution of Content Types on Amazon Prime Video (Pie Chart - Univariate)

In [None]:
# Chart - 1 visualization codes
# Pie chart of content type distribution
type_counts = titles['type'].value_counts()
plt.figure(figsize=(6, 6))
plt.pie(type_counts, labels=type_counts.index, autopct='%1.1f%%', startangle=140, colors=sns.color_palette('pastel'))
plt.title('Distribution of Content Types (Movies vs Shows) on Amazon Prime')
plt.show()


##### 1. Why did you pick the specific chart?

The pie chart is ideal for visualizing distributions of categorical variables with a few categories—in this case, show_type (Movies vs Shows).

It provides an immediate sense of proportion, making it easy to communicate what dominates Amazon Prime’s content library.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals whether Movies or Shows make up the bulk of Amazon Prime’s content.

For example, Movies comprise 86.2% and Shows 13.8%, it demonstrates a significant imbalance or content focus.

This informs us about the company’s content strategy and preferences reflected in their library.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Understanding content type distribution helps the business align investments to user preferences. If user data supports this split, it’s positive; if not, the insight prompts strategic adjustment.

Potential Negative Growth: Over-reliance on a single content category (e.g., too many Movies, too few Shows) risks alienating parts of the audience, reducing engagement from users looking for episodic content. A balanced catalog would better attract and retain a diverse subscriber base.

#### Chart - 2. Top Genres on Amazon Prime Video (Horizontal Bar Chart - Univariate)

In [None]:
# Chart - 2 visualization code
# Explode genres lists and count frequencies
all_genres = titles['genres'].explode()
genre_counts = all_genres.value_counts().head(10)  # Top 10 genres

plt.figure(figsize=(10,6))
genre_counts.sort_values().plot(kind='barh', color='mediumpurple')
plt.title('Top 10 Genres on Amazon Prime Video')
plt.xlabel('Number of Titles')
plt.ylabel('Genre')
plt.show()

##### 1. Why did you pick the specific chart?

A horizontal bar chart is best for comparing the frequencies of top genres, especially with long genre names or when presenting many categories side by side and is ideal for ranked categorical data, clearly showing which genres dominate the platform.

##### 2. What is/are the insight(s) found from the chart?

The chart immediately identifies the most prevalent genres (e.g., Drama, Comedy, Action), typically showing that Drama leads, followed by Comedy and Action. Less common genres appear at the bottom, sometimes revealing under-served content niches.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Understanding which genres are most and least common helps Amazon Prime strategically invest in or diversify its content. Popular genres can be further promoted, while underrepresented niches might represent growth opportunities.

Potential Negative Growth: Over-concentration in popular genres could lead to user fatigue or alienate those seeking niche content. If Drama and Comedy dominate but user segments are underserved (e.g.,Documentary, Horror, Family), this could lead to subscriber loss in those audiences.

#### Chart - 3. Distribution of Titles by Release Year on Amazon Prime Video (Line Chart - Bivariate )

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(12, 6))

# Count of titles per release year (filtering out unknown or zero years)
release_year_counts = titles[titles['release_year'] > 0]['release_year'].value_counts().sort_index()

sns.lineplot(x=release_year_counts.index, y=release_year_counts.values)
plt.title('Number of Titles Released Each Year on Amazon Prime Video')
plt.xlabel('Release Year')
plt.ylabel('Number of Titles')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

A line chart effectively shows how the number of titles released changes over time, providing a clear view of trends in content addition by year and it highlights growth, peaks, and possible declines in titles released, which is essential for understanding how the content library has evolved over time.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals periods of increased content production or acquisition, often reflecting strategic shifts or market expansion.
Usually, there will be a significant increase in titles from year 2000 to 2020, indicating growth in Amazon Prime's content library.
Earlier years may have fewer titles listed because of platform launch dates or data coverage.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Identifying growth trends helps Amazon Prime Video assess the effectiveness of its content acquisition and production strategy over time. Sustained or accelerating growth signals a healthy expansion attracting new subscribers interested in fresh content.

Potential Negative Growth: Trend shows a slowdown or decline in recent years, it could indicate content stagnation risking subscriber loss to competitors. Similarly, if content growth spikes only in niche years, it may reflect overinvestment followed by pullback, which could complicate subscriber retention.

#### Chart - 4. Distribution of IMDb Scores on Amazon Prime Video (Histogram - Univariate)

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(10,6))

# Filter out missing or zero IMDb scores if any
imdb_scores = titles['imdb_score'].dropna()

sns.histplot(imdb_scores, bins=20, kde=False, color='skyblue')
plt.title('Distribution of IMDb Scores on Amazon Prime Video')
plt.xlabel('IMDb Score')
plt.ylabel('Number of Titles')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

##### 1. Why did you pick the specific chart?

A histogram is perfect for showing the distribution of a single quantitative variable—here, the IMDb scores. It reveals how ratings are spread out across the dataset, showing concentration, skewness, or gaps in the rating spectrum.
It also help quickly understand patterns in user ratings and where most content clusters in terms of quality perception.

##### 2. What is/are the insight(s) found from the chart?

The histogram likely shows that most titles cluster around a mid-range IMDb score (for example, between 6 and 8), indicating moderate to good user reception.
There may be fewer titles with very low (<4) or very high (>9) IMDb scores, suggesting extremes are rare.
Such a distribution reflects a typical content quality spread, with most titles appealing to the average viewer.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Understanding the IMDb score distribution enables Amazon Prime to identify segments of the catalog that meet or exceed viewer expectations, guiding marketing and acquisition strategies focused on higher-rated content.

Insights about clustering in the middle range might encourage investment in elevating content quality or promoting high-rated titles more aggressively.

Potential Negative Growth: If a significant portion of content lies at the lower end of ratings, it could indicate many underperforming titles, potentially harming user retention or satisfaction.

Without improving or pruning low-rated content, the platform risks subscriber churn due to perceived low quality.

#### Chart - 5. Scatter Plot of IMDb Score vs. TMDb Popularity (with Show Type Hue) - Multivariate

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(10, 6))

# Scatter plot with hue by show_type to show relationship between IMDb score and TMDb popularity
sns.scatterplot(data=titles, x='imdb_score', y='tmdb_popularity', hue='type', alpha=0.6, palette='Set2', edgecolor=None)

plt.title('IMDb Score vs TMDb Popularity by Show Type on Amazon Prime')
plt.xlabel('IMDb Score')
plt.ylabel('TMDb Popularity')
plt.legend(title='Show Type')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

The scatter plot is ideal for visualizing relationships between two numeric variables (imdb_score and tmdb_popularity).
Using color (hue) by show_type adds a categorical dimension to differentiate patterns between Movies and Shows.
This chart helps detect correlations, clusters, or outliers between the two scoring/popularity metrics.
It supports visual comparison of how different content types perform across rating and popularity.

##### 2. What is/are the insight(s) found from the chart?

The plot reveals how IMDb user scores align (or differ) from TMDb popularity.

Clustering of titles in the mid-range scores and popularity, with some outliers that have high scores but relatively low popularity or vice versa.

Potentially, Shows might cluster differently than Movies, suggesting differences in audience engagement or rating patterns.

Outliers could indicate hidden gems (highly rated but less popular) or overhyped titles (popular but low rated).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Understanding the relationship between these two metrics helps Amazon Prime identify titles that might benefit from increased promotion (e.g., highly rated yet less popular titles).

The distinction between Movies and Shows allows for targeted marketing strategies per content type.

Potential Negative Growth: Titles with low scores but high popularity indicate risks of user dissatisfaction possibly leading to churn if not addressed.

Strategically balancing and promoting content aligned in both quality and popularity metrics can maximize user engagement and retention.

#### Chart - 6. Titles by Runtime Band (Boxplot - Bivariate)

In [None]:
# Chart - 6 visualization code
# Create runtime bands for better visualization (e.g., bins every 30 minutes)
# Define bins and labels
bins = [0, 30, 60, 90, 120, 150, 180, 1000]
labels = ['0-30', '31-60', '61-90', '91-120', '121-150', '151-180', '180+']

# Create a new column 'runtime_band' with the binned runtime values
titles['runtime_band'] = pd.cut(titles['runtime'], bins=bins, labels=labels, right=False)

plt.figure(figsize=(12, 7))

# Boxplot of runtime distribution per show_type with hue as runtime_band here isn't needed;
# Instead, we'll show the runtime distribution for Movies and Shows
sns.boxplot(x='type', y='runtime', data=titles, palette='Set3')

plt.title('Runtime Distribution by Content Type (Movie vs Show) on Amazon Prime')
plt.xlabel('Content Type')
plt.ylabel('Runtime (minutes)')
plt.grid(True, axis='y', linestyle='--', alpha=0.7)
plt.show()

##### 1. Why did you pick the specific chart?

The boxplot is excellent for comparing the distribution of a numerical variable (runtime) across categorical conditions (show_type).
It visually summarizes median, interquartile ranges, and potential outliers, helping us understand how movie lengths and show episode lengths differ.


##### 2. What is/are the insight(s) found from the chart?

Typically, Movies have longer runtimes clustered around 90–120 minutes, which corresponds to feature-length films.

Shows tend to have shorter runtimes per episode, with a wider spread reflecting different show formats (e.g., 20-minute sitcoms vs. hour-long dramas).

The boxplot may also highlight outliers, such as unusually long movies or very short episodes.

Understanding runtime differences can help content scheduling or user engagement strategies.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Knowing runtime distributions by content type assists Amazon Prime in tailoring user experience features like watch suggestions, binge-watching promotion, or personalized session timing.

Highlighting that shows generally are shorter could support investment in episodic content to attract binge-watchers or users with limited viewing time.

Potential Negative Growth: If the majority of movies have very long runtimes, some users with less time might get deterred, limiting engagement. Similarly, shows with widely varying episode lengths may confuse viewers expecting consistency.

#### Chart - 7. Age Certification Counts (Bar Chart - Univariate)

In [None]:
# Chart - 7 visualization code
# Count the frequency of each age certification category
age_cert_counts = titles['age_certification'].value_counts().sort_values(ascending=False)

plt.figure(figsize=(12, 6))
sns.barplot(x=age_cert_counts.index, y=age_cert_counts.values, palette='muted')

plt.title('Distribution of Age Certification Counts on Amazon Prime Video')
plt.xlabel('Age Certification')
plt.ylabel('Number of Titles')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

##### 1. Why did you pick the specific chart?

The bar chart is well-suited to display and compare the count of categorical values—in this case, different age certification categories.

It provides a clear, straightforward visual of how content is distributed across different recommended audience ages.

Compared to pie or other charts, a bar chart handles many categorical values well with clear labeling and ranking by count.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals which age certification categories dominate Amazon Prime's content library.

For instance, many titles might be rated "Not Rated," "PG," or "TV-14," showing the content focus demographic.

It highlights possible skew towards family-friendly or adult-oriented content.

Categories with fewer titles may suggest under-served audience segments or niche content areas.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Understanding the age certification distribution helps Amazon Prime optimize content provisioning according to viewer preferences and regulations.

It supports targeted marketing and acquisition strategies matching audience age groups, improving customer satisfaction and retention.

Insights on strong presence in certain age categories guide investment in similar or complementary content to boost engagement.

Potential Negative Growth: An over-representation of "Not Rated" or adult-only content might limit appeal to family or younger audiences, potentially reducing subscriber diversity.

A lack of balanced age category content risks alienating segments and slowing subscriber growth in highly competitive markets.

#### Chart - 8. Production Country Frequency on Amazon Prime Video  (Horizontal Bar Chart - Univariate)

In [None]:
# Chart - 8 visualization code
# Explode the 'production_countries' list column for counting countries
all_countries = titles['production_countries'].explode()

# Count occurrences of each country, get top 10
country_counts = all_countries.value_counts().head(10)

# Plot horizontal bar chart for better readability with country names
plt.figure(figsize=(10,6))
country_counts.sort_values().plot(kind='barh', color='cornflowerblue')
plt.title('Top 10 Production Countries for Amazon Prime Video Titles')
plt.xlabel('Number of Titles')
plt.ylabel('Production Country')
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()


##### 1. Why did you pick the specific chart?

A horizontal bar chart clearly displays categorical frequency data, especially where category names (country names) may be long or numerous.
It allows straightforward visual comparison of the number of titles produced by each country, making it easy to identify the dominant production regions.



##### 2. What is/are the insight(s) found from the chart?

The chart typically shows the United States as the leading production country on Amazon Prime Video, often contributing about 45% of titles.

Other top producers usually include US, Canada, India, and France.

This highlights that the content library is heavily US-centric but does include notable international contributions.

The diversity or concentration of production countries informs us about the geographical sourcing and cultural variety in the content catalog.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Knowing where most content originates enables Amazon Prime to adjust its acquisition and production strategies—possibly investing more in underrepresented regions to diversify and localize offerings, thereby expanding global appeal.

The insights support targeted regional marketing and partnership development with content producers in those key countries.

Potential Negative Growth: An overreliance on US-based content may limit Amazon Prime's appeal in non-US markets where viewers prefer culturally relevant productions.

Lack of sufficient local content diversity could slow subscriber growth or retention in international regions, where users seek content reflecting local languages, customs, and interests

#### Chart - 9. Runtime vs IMDb Score (Scatter Plot with Regression Line - Bivariate)

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(10, 6))

# Scatter plot of runtime vs IMDb score with regression line for trend analysis
sns.regplot(data=titles, x='runtime', y='imdb_score', scatter_kws={'alpha':0.5, 's':40}, line_kws={'color':'red'})

plt.title('Relationship Between Runtime and IMDb Score on Amazon Prime Video')
plt.xlabel('Runtime (minutes)')
plt.ylabel('IMDb Score')
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot with a regression line was chosen to explore the continuous relationship between two numeric variables: runtime (length of the title) and imdb_score (viewer rating).
This combination effectively conveys whether longer or shorter runtimes tend to correspond to higher or lower ratings.

##### 2. What is/are the insight(s) found from the chart?

The plot typically shows a weak or negligible correlation between runtime and IMDb score, indicating that length alone is not a strong predictor of content quality or reception.
Some titles with very short or very long runtimes may still achieve high or low scores, reflecting broad viewer preference diversity.
The regression line (often nearly flat) confirms no strong linear trend, suggesting other factors beyond runtime impact ratings more heavily.
Clusters and outliers visible on the scatter may hint at specific runtime ranges worth deeper analysis (e.g., very short shows scoring well or very long movies scoring poorly).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Recognizing that runtime isn't strongly tied to user ratings enables Amazon Prime to diversify content length without fearing it will drastically affect ratings, supporting varied user preferences and watch habits.

This insight can guide balanced acquisition strategies to include both short-form and long-form content.

Potential Negative Growth: Without this analysis, an unexamined assumption that longer movies automatically imply better quality could mislead content investment.

If the platform over-prioritized length without considering rating trends, it might waste resources on unnecessarily long content that does not improve user satisfaction.

#### Chart - 10. Genre Counts by Production Country with Average IMDb Score (Grouped Bar Plot - Multivariate)

In [None]:
# Chart - 10 visualization code

# Explode 'genres' and 'production_countries' to get one row per genre-country pair
df = titles[['genres', 'production_countries', 'imdb_score']].dropna(subset=['genres', 'production_countries', 'imdb_score'])
df = df.explode('genres').explode('production_countries')

# Select top 5 genres and top 5 countries to keep the chart readable
top_genres = df['genres'].value_counts().head(5).index
top_countries = df['production_countries'].value_counts().head(5).index
df_filtered = df[(df['genres'].isin(top_genres)) & (df['production_countries'].isin(top_countries))]

# Group data by genre and country, count number of titles and calculate average IMDb score
grouped = df_filtered.groupby(['genres', 'production_countries']).agg(
    title_count=('imdb_score', 'count'),
    avg_imdb_score=('imdb_score', 'mean')
).reset_index()

# Plot grouped barplot showing title counts (height of bars) grouped by genre,
# with hue representing the production country.
plt.figure(figsize=(14, 7))
sns.barplot(x='genres', y='title_count', hue='production_countries', data=grouped)

plt.title('Number of Titles per Genre by Production Country (Top 5 Each) on Amazon Prime Video')
plt.xlabel('Genre')
plt.ylabel('Number of Titles')
plt.xticks(rotation=45)
plt.legend(title='Production Country')
plt.grid(True, axis='y', linestyle='--', alpha=0.7)
plt.show()


##### 1. Why did you pick the specific chart?

The grouped bar plot helps to visualize how the number of titles distributes across the intersection of two categorical variables—here, Genre and Production Country. It effectively compares multiple categories side by side, showing how each top country contributes volume within each major genre.

This chart supports clear comparisons, revealing geographical strengths and content focus areas by genre.

##### 2. What is/are the insight(s) found from the chart?

Certain countries dominate specific genres; for example, the US might lead in Drama and Comedy titles, while other countries may specialize in specific genres.

It shows how genre popularity varies by production country, hinting at cultural or industry specialization.

The total number of titles per genre-country pair highlights the breadth of content offerings available from different global regions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Understanding genre-country distribution helps Amazon Prime tailor acquisition and production strategies, investing in underrepresented genres or expanding content from promising production countries.
It can inform localized marketing campaigns emphasizing strong genre-country combinations.

Potential Negative Growth: Overdependence on a few countries or genres risks limiting content diversity, which could alienate international or niche audiences.
Underinvestment in emerging markets or genres could slow subscriber growth in those segments.

#### Chart - 11. Age Certification, Genre, and IMDb Score ( Boxplot Grid - Multivariate )

In [None]:
# Chart - 11 visualization code
# Explode genres to have one row per genre
df = titles[['age_certification', 'genres', 'imdb_score']].dropna(subset=['age_certification', 'genres', 'imdb_score'])
df = df.explode('genres')

# For readability, select top 6 genres by count
top_genres = df['genres'].value_counts().head(6).index
df_filtered = df[df['genres'].isin(top_genres)]

# Optional: Filter out "Not Rated" or similar ambiguous age certifications (if needed)
df_filtered = df_filtered[df_filtered['age_certification'] != 'Not Rated']

# Create a FacetGrid with boxplots per genre
g = sns.catplot(
    data=df_filtered,
    x='age_certification',
    y='imdb_score',
    col='genres',
    kind='box',
    col_wrap=3,  # number of subplots per row
    height=4,
    aspect=1,
    palette='Set3',
    showfliers=False  # hide outliers for cleaner view, optional
)

g.set_axis_labels("Age Certification", "IMDb Score")
g.set_titles("{col_name} Genre")
g.set_xticklabels(rotation=45)
g.fig.suptitle("IMDb Score Distribution by Age Certification and Genre on Amazon Prime Video", y=1.02)

plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A boxplot grid (faceted boxplots) enables detailed comparison of a numerical variable (imdb_score) across two categorical variables (age_certification and genres), by splitting the plot into multiple subplots—one per genre.
This visualizes the distribution (median, quartiles) of IMDb scores within each age certification category for each genre separately, offering granular insights.

##### 2. What is/are the insight(s) found from the chart?

The chart can reveal:

Which age certifications tend to have higher or lower median IMDb scores within different genres.

Variability and spread of ratings in age categories by genre (e.g., family-friendly genres might have tighter score ranges).

Comparative performance of genres across age groups, spotting genres where scores vary widely by certification.

Whether some genres have consistently high or low ratings regardless of age certification.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Enables Amazon Prime to align content acquisition and marketing with genres and age certifications that consistently receive higher user ratings.,Helps identify content gaps where certain genres or age categories underperform, guiding content improvement or diversification.
Supports targeting specific age-demographic segments with genres that perform well for them, improving user satisfaction and retention.

Potential Negative Growth: If certain genres show consistently low IMDb scores across key age certifications, it may indicate content quality issues or mismatches with audience expectations, risking churn. Over-reliance on genres with narrow age certification appeal might limit platform diversity, potentially alienating broader audiences.

#### Chart - 12. Production Country & Genre (Stacked Bar Plot - Bivariate )

In [None]:
# Chart - 12 visualization code
# Explode genres and production_countries for counting
df = titles[['genres', 'production_countries']].dropna(subset=['genres', 'production_countries'])
df = df.explode('genres').explode('production_countries')

# Select top 6 genres and top 5 countries for a clear plot (modify as needed)
top_genres = df['genres'].value_counts().head(6).index
top_countries = df['production_countries'].value_counts().head(5).index
df_filtered = df[(df['genres'].isin(top_genres)) & (df['production_countries'].isin(top_countries))]

# Check if df_filtered is empty before plotting
if df_filtered.empty:
    print("No data found for the selected top genres and countries.")
else:
    # Group by genre and country, then count titles
    grouped_counts = df_filtered.groupby(['genres', 'production_countries']).size().reset_index(name='title_count')

    # Plot stacked bar using seaborn
    plt.figure(figsize=(14, 7))
    sns.barplot(x='genres', y='title_count', hue='production_countries', data=grouped_counts, dodge=False, palette='tab20')

    plt.title('Number of Titles per Genre by Production Country (Top 6 Genres, Top 5 Countries)')
    plt.xlabel('Genre')
    plt.ylabel('Number of Titles')
    plt.xticks(rotation=45)
    plt.legend(title='Production Country')
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.show()

##### 1. Why did you pick the specific chart?

The stacked bar plot is ideal to simultaneously display the composition and distribution of a numeric variable (number of titles) across two categorical variables (genre and production country).
It highlights how each country contributes to the total volume within each genre, making comparisons of both overall genre popularity and country-specific strengths visible in a single visual.

##### 2. What is/are the insight(s) found from the chart?

The plot shows for each genre (e.g., Drama, Comedy, Thriller) how the total count of titles splits among the top production countries.

Observations:

Genre-country specializations: Some countries contribute disproportionately to certain genres (e.g., the US dominates Comedy, India leads Documentary).

Content diversity: Presence of multiple significant segments within a genre signals international diversity, while dominance by a single color indicates reliance on one country.

Catalog gaps: Genres lacking contributions from major countries can signal untapped collaboration or acquisition opportunities.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Amazon Prime can leverage these insights to target content acquisition that addresses gaps in genre-country combinations, increasing catalog diversity and global appeal.
Marketing teams can spotlight international variety or provenance in promotional campaigns.

Potential Negative Growth:
If key genres rely too much on a single country, it creates vulnerability to changes in that market’s output or taste, risking catalog stagnation.
Limited international representation in popular genres may alienate global subscribers or miss local market preferences.

#### Chart - 13. Comparison of Genre Proportions Across Top Countries (Grouped Bar Plot — Bivariate)

In [None]:
# Chart - 13 visualization code
# Prepare data: explode genres and production countries for proper aggregation
df = titles[['genres', 'production_countries']].dropna(subset=['genres', 'production_countries'])
df = df.explode('genres').explode('production_countries')

# Select top 5 genres and top 5 countries for interpretability
top_genres = df['genres'].value_counts().head(5).index
top_countries = df['production_countries'].value_counts().head(5).index
df_filtered = df[(df['genres'].isin(top_genres)) & (df['production_countries'].isin(top_countries))]

# Aggregate: count number of titles per genre-country pair
grouped = df_filtered.groupby(['genres', 'production_countries']).size().reset_index(name='title_count')

# Plot grouped (clustered) bar plot using seaborn
plt.figure(figsize=(12, 7))
sns.barplot(
    x='genres',
    y='title_count',
    hue='production_countries',
    data=grouped,
    palette='Set2'
)
plt.title('Grouped Bar Plot: Number of Titles per Top Genre by Top Production Country')
plt.xlabel('Genre')
plt.ylabel('Number of Titles')
plt.legend(title='Production Country')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The grouped bar plot is ideal for comparing the proportions/counts of genres across countries in a compact, readable form. It visualizes how different production countries contribute to each prominent genre, enabling side-by-side comparisons within each genre.

By using only the top genres and countries, the chart is free from clutter and actionable for business decisions.

##### 2. What is/are the insight(s) found from the chart?

For each major genre, the distribution of bar heights immediately shows which country is most active or dominant.

The chart can reveal, for instance, that the US dominates Drama and Comedy, while India might lead in Documentaries.

Patterns of balanced or skewed contributions highlight both diversity and possible over-reliance.

This exposes market strengths, international variety, and competitive positions within genre segments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Informs targeted acquisition—if a top genre is dominated by one country, acquiring titles from underrepresented countries can improve diversity and broaden global appeal. Reveals market strengths for regionalized marketing and product positioning and can guide investments into genres where new markets show growth potential.

Potential Negative Growth: Overdominance by a single country in several genres could result in lack of cultural variety, risking alienation of global subscribers.
Underrepresentation of otherwise popular genres in specific countries indicates missed business opportunities.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Pick relevant quantitative columns for correlation analysis (adjust as needed)
num_cols = ['imdb_score', 'tmdb_popularity', 'runtime', 'release_year']
df_corr = titles[num_cols].dropna()

# Compute the correlation matrix
corr = df_corr.corr()

# Plot the heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(
    corr,
    annot=True,              # Show correlation coefficient values
    fmt=".2f",               # 2 decimal places
    cmap='coolwarm',         # Blue = negative, red = positive correlation
    center=0,                # Center at zero
    square=True,             # Square cells
    linewidths=0.5,
    cbar_kws={'shrink': 0.75}
)
plt.title('Correlation Heatmap: IMDb Score, TMDb Popularity, Runtime & Release Year')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The correlation heatmap is uniquely suited for showing the strength and direction of linear relationships among several quantitative variables in one concise visual.

Color gradients immediately highlight positive (red) and negative (blue) associations, and the annotation allows rapid identification of strongly/weakly correlated pairs.

##### 2. What is/are the insight(s) found from the chart?

it shows which variable pairs have strong positive or negative correlations (values close to +1 or -1), and which are almost uncorrelated (near 0).

For example, imdb_score and tmdb_popularity are only weakly correlated (suggesting that higher ratings do not always mean higher popularity). Or, that runtime is almost independent of release_year or that some variables exhibit moderate association.

The absence of clear correlations may reveal true independence or signal the need to look for non-linear or more complex relationships.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Select numerical columns of interest for pairwise relationships
num_cols = ['imdb_score', 'tmdb_popularity', 'runtime', 'release_year']

# Filter data to drop rows with missing values in these columns
df_pair = titles[num_cols].dropna()

# Create the pair plot
sns.pairplot(df_pair, diag_kind='kde', height=2.5, aspect=1)
plt.suptitle('Pair Plot of IMDb Score, TMDb Popularity, Runtime, and Release Year', y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

The pair plot provides a comprehensive overview of pairwise relationships and distributions among multiple quantitative variables simultaneously.

It visualizes scatter plots between variable pairs off-diagonal, helping identify correlations, patterns, clusters, or outliers.

The diagonal shows univariate distributions (here using kernel density estimates) to understand each variable's distribution.

##### 2. What is/are the insight(s) found from the chart?

It shows relationships between pairs of variables (IMDb score, TMDb popularity, runtime, release year), revealing whether variables correlate positively, negatively, or show no clear pattern.

The diagonal plots display the distribution shapes of each variable (e.g., skewness, peaks).

You can spot clusters of typical titles and outliers with unusual combinations (e.g., very popular but low-rated).

Generally, the plot may reveal weak to moderate correlations, indicating these features relate but are not strongly dependent.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on our analysis of the Amazon Prime Video dataset, we recommend the following approach to effectively achieve your business objectives:

Optimize Content Acquisition and Production:
Our heatmap and bar plot analyses reveal which genres, release years, and runtimes resonate most with your audience. By focusing on acquiring and producing content in these high-performing areas—while also balancing across different age certifications and production countries—you can maximize viewer engagement and satisfaction.

Tailor Marketing and Audience Targeting:
Insights into genre popularity by country and actor prominence enable more precise marketing efforts. Targeted campaigns that promote popular genres and favored actors within specific regions or demographics will improve audience retention and growth.

Enhance Content Quality and Portfolio Decisions:
The correlation and pair plot results help identify key factors influencing content success. Avoid investing in content segments with lower ratings and popularity, and emphasize high-quality, well-rated titles most likely to attract and retain subscribers.

Promote Diversity and Localization:
Our geographic and genre distribution analyses highlight areas where content is concentrated or lacking. Expanding your offerings in underrepresented countries and genres can diversify the catalog, appealing to global and niche audiences and driving international growth.

Implement Ongoing Data-Driven Monitoring:
Continuous use of these analytical approaches will allow you to track shifts in viewer preferences and content performance. This agility supports timely adjustments in strategy, ensuring relevance and competitiveness in a dynamic market.

This data-driven framework ensures you allocate resources effectively, strengthen your content portfolio, and enhance personalized user experiences—all of which contribute directly to subscriber growth and retention.

# **Conclusion**

The analyses provide a robust, multi-dimensional understanding of Amazon Prime Video’s content portfolio and audience response patterns. The insights gained support a data-driven approach to:

Optimizing content acquisition and production by focusing on genres, runtimes, and countries with demonstrated viewer appeal and high ratings.

Enhancing marketing effectiveness through targeted campaigns that highlight popular genres in specific countries and leverage star actors connected with audience favorites.

Improving content quality and portfolio balance by identifying and addressing underperforming segments while nurturing high-value areas.

Expanding diversity and localization to capture non-dominant markets and underserved genres, fostering inclusive growth and global subscriber retention.

Sustaining agility by implementing continuous performance monitoring and real-time analysis pipelines to adapt strategy dynamically to evolving trends.

This project successfully illustrates how deep analytical exploration of streaming platform data can unlock vital business intelligence. By integrating such insights into operational decision-making, Amazon Prime Video can sharpen competitive advantage, delight diverse audiences, and drive sustainable subscriber growth in an increasingly crowded digital entertainment landscape.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***