
# Netflix War Movies Data Analysis

This Jupyter notebook contains analysis on a dataset of Netflix war movies. The analyses focus on understanding the relationship between movie duration and user retention, and employing various statistical tests to validate findings.



## Duration and Retention Analysis - Part 1

The first part of our analysis involves a simple scatter plot to visualize the relationship between movie duration and user retention, followed by calculating the Pearson correlation coefficient.


In [None]:

import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

# Load the dataset
df = pd.read_csv('netflix_war_movies.csv')

# Convert duration from string to minutes
df['duration'] = df['duration'].str.replace(' min', '').astype(int)

# Simple scatter plot
plt.scatter(df['duration'], df['user_retention'])
plt.xlabel('Duration (minutes)')
plt.ylabel('User Retention (%)')
plt.title('Duration vs User Retention for War Movies on Netflix')
plt.show()

# Pearson correlation coefficient
correlation, p_value = pearsonr(df['duration'], df['user_retention'])
print(f'Pearson correlation coefficient: {correlation}, P-value: {p_value}')



## Testing Algorithms and Methods - Part 1

Here, we perform an independent t-test to see if the mean user retention for movies longer than 2 hours differs significantly from those shorter than 2 hours.


In [None]:

from scipy.stats import ttest_ind

# Splitting the dataset
short_movies = df[df['duration'] < 120]
long_movies = df[df['duration'] >= 120]

# Independent t-test
t_stat, p_val = ttest_ind(short_movies['user_retention'], long_movies['user_retention'])
print(f'T-test statistic: {t_stat}, P-value: {p_val}')
