To ensure accurate results, I made the decision to manually input the data into the code. Since the dataset was small, it didn't cause any issues.

To begin with, I will analyze the relationship between the number of Netflix subscribers and TV viewers in Italy. Following that, I will examine the correlation between Netflix revenue and TV viewership in Italy. Although the dataset is small, we can make preliminary predictions about the results even before conducting the correlation analysis. However, I chose to perform a correlation analysis as part of the statistical analysis process. This decision is motivated by the fact that accurately predicting the correlation for larger datasets in the future is not possible without performing analysis. Therefore, using correlation analysis becomes essential to ensure clear and reliable outcomes.

In [1]:
import pandas as pd
from scipy.stats import pearsonr

# The TV viewers data
tv_viewers_data = {
    'Years': [2017, 2018, 2019, 2020, 2021, 2022],
    '3-5 years': [94.3, 92, 88.3, 87.7, 86.5, 83.7],
    '6-10 years': [93.1, 90.2, 87.2, 90.8, 85.4, 85.1],
    '11-14 years': [87.7, 81.3, 77.3, 77.1, 75.3, 70.8],
    '15-17 years': [77.9, 69.6, 67.5, 65, 63.7, 57.5],
    '18-19 years': [73.3, 66.8, 61.4, 67, 62, 54.8],
    '20-24 years': [72.7, 63.9, 61.7, 61.2, 60.6, 51.8],
    '25-34 years': [75.5, 72.1, 68.5, 68.1, 67.9, 66.4],
    '35-44 years': [81.5, 75.2, 72.9, 73.8, 74.8, 69.6],
    '45-54 years': [85.5, 80, 77.5, 79.1, 79.7, 76],
    '55-59 years': [89.4, 84.7, 82, 83.1, 85.1, 80.2],
    '60-64 years': [91.9, 88, 87.5, 88.1, 88.2, 86],
    '65-74 years': [95.4, 91.9, 91.9, 92, 91.5, 90],
    '75 years and over': [94.9, 92.5, 91.2, 90.4, 92.2, 92.3]
}

tv_viewers_df = pd.DataFrame(tv_viewers_data)

#Netflix subscribers data
netflix_subscribers_data = {
    'Years': [2017, 2018, 2019, 2020, 2021, 2022],
    'Netflix Subscribers (mln)': [0.7, 1.4, 2, 3.9, 4.2, 5.4]
}

netflix_subscribers_df = pd.DataFrame(netflix_subscribers_data)

# Merge the TV viewers and Netflix subscribers dataframes
df = pd.merge(tv_viewers_df, netflix_subscribers_df, on='Years')

# Calculate the correlation for each age group
age_groups = df.columns[1:-1]
correlations = {}

for age_group in age_groups:
    correlation, _ = pearsonr(df[age_group], df['Netflix Subscribers (mln)'])
    correlations[age_group] = correlation

# Print the correlations
for age_group, correlation in correlations.items():
    print(f"Correlation coefficient for {age_group}: {correlation}")


Correlation coefficient for 3-5 years: -0.9478484690868378
Correlation coefficient for 6-10 years: -0.7347104590938868
Correlation coefficient for 11-14 years: -0.9129616359781187
Correlation coefficient for 15-17 years: -0.9359186498906249
Correlation coefficient for 18-19 years: -0.7779464625462325
Correlation coefficient for 20-24 years: -0.8816219778932152
Correlation coefficient for 25-34 years: -0.8831634851448391
Correlation coefficient for 35-44 years: -0.7688582019877661
Correlation coefficient for 45-54 years: -0.7080198168200367
Correlation coefficient for 55-59 years: -0.6929300419471925
Correlation coefficient for 60-64 years: -0.7166850935654481
Correlation coefficient for 65-74 years: -0.7931147023177474
Correlation coefficient for 75 years and over: -0.5041902559928758


In [7]:
#Netflix revenue data
netflix_revenue_data = {
    'Years': [2017, 2018, 2019, 2020, 2021, 2022],
    'Netflix Revenue (mln)': [70.1, 119.67, 169.94, 222.14, 240.13, 259.82]
}

netflix_revenue_df = pd.DataFrame(netflix_revenue_data)

# Calculate correlation coefficients for each age group
correlation_coefficients = {}
for column in tv_viewers_df.columns[1:]:
    correlation, _ = pearsonr(tv_viewers_df[column], netflix_revenue_df['Netflix Revenue (mln)'])
    correlation_coefficients[column] = correlation

# Display correlation coefficients
for age_group, correlation in correlation_coefficients.items():
    print(f"Correlation coefficient for {age_group}: {correlation}")


Correlation coefficient for 3-5 years: -0.9738893598534737
Correlation coefficient for 6-10 years: -0.7976671107503059
Correlation coefficient for 11-14 years: -0.9531313680710716
Correlation coefficient for 15-17 years: -0.951346782568494
Correlation coefficient for 18-19 years: -0.8093126029109199
Correlation coefficient for 20-24 years: -0.8913281954169912
Correlation coefficient for 25-34 years: -0.9601437121854177
Correlation coefficient for 35-44 years: -0.8247413212310357
Correlation coefficient for 45-54 years: -0.7920007212660682
Correlation coefficient for 55-59 years: -0.758059505020021
Correlation coefficient for 60-64 years: -0.7833444013231486
Correlation coefficient for 65-74 years: -0.8444440288283632
Correlation coefficient for 75 years and over: -0.6585344257187009
