## **PPG12_s1 – DATA ANALYSIS AT NETFLIX**  
**Topics:** Time Series & Logistic Regression  
**Datasets Provided:**  
- `Weekly_Views.csv`  
- `Users.csv`

**Required Tools:** Python, Pandas, Plotly, Statsmodels, Scikit-learn

**GENERAL INSTRUCTIONS**

1. You must answer all questions using Python.
2. All visualizations must be created using **Plotly** (interactive plots).
3. Briefly justify every analytical or modeling decision.
4. Final submission: a `.ipynb` file and its export to `.pdf` or `.html`.

**PART 1: TIME SERIES ANALYSIS**

**Scenario:** You work on Netflix's content analytics team. Your task is to analyze the weekly performance of a new original series launched a year ago.


1. **Initial visualization (Plotly)**  
   Create an interactive line plot using Plotly to display weekly viewership trends. Identify and briefly describe any of the following patterns you observe:  
   - Trend  
   - Seasonality  
   - Randomness



2. **Time series decomposition**  
   Use classical decomposition (trend, seasonal, residual components) and display each component separately. Which component appears to dominate the series?



3. **Stationarity check**  
   Apply the Augmented Dickey-Fuller (ADF) test. Is the series stationary? If not, apply an appropriate transformation (e.g., differencing or logarithmic), and re-evaluate.



4. **Modeling (ARIMA or SARIMA)**  
   Fit a model to forecast the next 4 weeks of viewership. Justify your model selection. Provide:  
   - Forecast results  
   - Confidence intervals  
   - Interactive visualization comparing historical and forecasted values

5. **Model validation**  
   Calculate a validation error metric (e.g., MAE or RMSE) using part of the data. How reliable is your model?



6. **Business strategy recommendation**  
   Based on the trends or peaks observed, propose two actionable strategies Netflix could use to increase engagement with this series.



**PART 2: LOGISTIC REGRESSION**

**Scenario:** You're part of Netflix's personalization team. You are building a model to predict the probability that a user will click on a recommendation based on their behavior.

7. **Exploratory visualization (Plotly)**  
   Use a Plotly scatter plot to examine the relationship between `exposure_time` and `clicked`, using `watched_trailer` as a color/hue. What insights do you gain?



8. **Logistic model construction**  
   Fit a logistic regression model using the following features:  
   - `age`  
   - `watch_time_per_week`  
   - `watched_trailer`  
   - `exposure_time`  
   Display the model’s coefficients and explain the effect of each variable.



9. **Model evaluation**  
   Build and visualize the confusion matrix. Calculate and interpret the following metrics:  
   - Accuracy  
   - Sensitivity (Recall)  
   - Specificity



10. **ROC Curve and AUC (Plotly)**  
   Plot the ROC curve using Plotly. Report and interpret the AUC value. Is the model a good classifier?



11. **Individual prediction**  
   Estimate the click probability for a user with the following characteristics:  
   - Age: 35  
   - Watch time per week: 15 hours  
   - Watched trailer: Yes  
   - Exposure time: 20 minutes  
   Should this user be predicted as a click (assuming a threshold of 0.5)?



12. **Interaction effects**  
   You suspect that the impact of `exposure_time` depends on whether the user watched the trailer. Create an interaction term and refit the model. Does the new model improve?



13. **Influential observations**  
   Calculate Cook’s distance. Are there any influential observations? How would they affect model interpretation?



14. **Multicollinearity check**  
   Assess multicollinearity between the explanatory variables. Is there redundancy? Suggest ways to address it.



15. **Strategic application**  
   Based on your analysis, propose a recommendation strategy that could increase the click-through rate for suggested titles.



**SUBMISSION CHECKLIST**

- Jupyter Notebook (`PPG12_s1.ipynb`)  
- Export to PDF or HTML  
- All plots must be interactive using Plotly  
- Code must include comments and clear interpretations