# TV Shows Dataset Analysis
## A Comprehensive Data Analytics Portfolio Project

**Author:** Data Analyst Portfolio  
**Date:** January 2026  
**Dataset:** 2,500+ TV Shows across multiple networks and platforms

---

## Executive Summary

This project analyzes a comprehensive dataset of 2,500+ television shows to uncover insights about content trends, viewer preferences, and industry patterns. Through data cleaning, exploratory data analysis, and visualization, we identify key factors that contribute to show success and longevity.

### Key Findings:
- **Quality Content**: 60 shows rated 8.5+ (exceptional), with Breaking Bad leading at 9.2/10
- **Content Diversity**: 28 unique genres with Drama (40.3%) and Comedy (27.9%) dominating
- **Network Insights**: ABC leads in volume (204 shows), while Showtime leads in quality (7.86 avg rating)
- **Global Reach**: 173 non-English shows (6.9%), with Japanese anime as significant segment
- **Longevity Champions**: Game shows and reality TV dominate episode counts (Jeopardy: 9,090 episodes)

---

## 1. Business Question & Objective

### Primary Questions:
1. What characteristics define highly-rated television shows?
2. Which networks produce the highest quality content?
3. How do content types (Scripted, Reality, Animation) differ in performance?
4. What patterns exist in show longevity and episode production?
5. How does international content perform compared to English-language shows?

### Project Objectives:
- Clean and prepare the dataset for analysis
- Identify trends and patterns in TV show characteristics
- Provide actionable insights for content creators and network executives
- Demonstrate proficiency in data analytics skills (Python, pandas, data visualization)

---

## 2. Data Collection & Preparation

### Dataset Overview:
- **Source**: TV Shows dataset with 2,565 initial records
- **Columns**: 17 features including Name, Rating, Network, Genres, Runtime, Episodes, Cast, etc.
- **Time Period**: Shows from various decades (1950s - 2020s)

### Data Cleaning Process:

#### Issues Identified:
1. **Duplicate Records**: 46 duplicate show names (remakes, reboots, international versions)
2. **Missing Values**: Varying levels across columns (ratings, networks, premiere dates)
3. **Data Types**: Genres stored as string representations of lists

#### Cleaning Steps Implemented:
1. Removed 46 duplicate show names (kept first occurrence)
2. Verified no exact duplicate rows
3. Analyzed missing value patterns
4. Parsed genre strings into list format for analysis

### Final Dataset:
- **Total Shows**: 2,519 unique TV shows
- **Shows with Ratings**: 1,745 (69.3%)
- **Complete Records**: Varying by feature

---

## 3. Exploratory Data Analysis

### 3.1 Ratings Analysis

**Summary Statistics:**
- Average Rating: **7.34 / 10**
- Median Rating: **7.50 / 10**
- Standard Deviation: **0.84**
- Range: 3.0 - 9.2

**Top 10 Highest-Rated Shows:**
1. Breaking Bad (9.2) - AMC
2. Firefly (9.0) - FOX
3. Avatar: The Last Airbender (8.9) - Nickelodeon
4. Sherlock (8.9) - BBC One
5. Attack on Titan (8.9) - NHK
6. The Wire (8.9) - HBO
7. One Piece (8.9) - Fuji TV
8. Gravity Falls (8.9) - Disney XD
9. Band of Brothers (8.9) - HBO
10. Game of Thrones (8.9) - HBO

**Key Insight**: HBO dominates the top-rated list with exceptional drama series, while anime shows (Attack on Titan, One Piece) perform exceptionally well.

![Ratings Distribution](docs/images/ratings_overview.png)

---

### 3.2 Genre Analysis

**Top 5 Genres by Frequency:**
1. Drama - 1,015 shows (40.3%)
2. Comedy - 703 shows (27.9%)
3. Action - 426 shows (16.9%)
4. Crime - 387 shows (15.4%)
5. Adventure - 287 shows (11.4%)

**Average Rating by Primary Genre** (minimum 15 shows):
- Drama: **7.54**
- Action: **7.46**
- Adventure: **7.24**
- Comedy: **7.21**

**Key Insight**: Drama shows not only dominate in quantity but also maintain the highest average rating, making them the most reliable genre for quality content.

![Genre Analysis](docs/images/genre_analysis.png)

---

### 3.3 Network Analysis

**Top 5 Networks by Show Count:**
1. ABC - 204 shows
2. NBC - 173 shows
3. CBS - 133 shows
4. FOX - 108 shows
5. BBC One - 103 shows

**Top 5 Networks by Quality** (minimum 20 shows):
1. Showtime - **7.86** (27 shows)
2. BBC One - **7.74** (80 shows)
3. HBO - **7.72** (56 shows)
4. Channel 4 - **7.71** (28 shows)
5. BBC Two - **7.68** (29 shows)

**Key Insights**:
- **Volume vs Quality Trade-off**: Major broadcast networks (ABC, NBC, CBS) produce high volumes but lower average ratings (7.24-7.30)
- **Premium Content**: Cable/streaming networks (Showtime, HBO) maintain higher quality with smaller catalogs
- **BBC Excellence**: BBC networks balance both volume and quality effectively

![Network Analysis](docs/images/network_analysis.png)

---

### 3.4 Content Type Analysis

**Distribution of Content Types:**
- Scripted: 1,506 shows (59.8%)
- Reality: 471 shows (18.7%)
- Animation: 209 shows (8.3%)
- Documentary: 204 shows (8.1%)
- Other: 129 shows (5.1%)

**Average Rating by Content Type:**
1. Scripted - **7.48**
2. Documentary - **7.37**
3. Animation - **7.29**
4. Reality - **6.39**

**Key Insights**:
- Scripted content dominates both volume and quality
- Reality TV shows significantly lower ratings (1+ point below scripted)
- Documentaries and Animation perform well despite smaller representation

![Content Type Analysis](docs/images/content_type.png)

---

### 3.5 Show Longevity Analysis

**Longevity Statistics:**
- Average Seasons: **4.68**
- Median Seasons: **3**
- Average Episodes: **106.24**
- Median Episodes: **28**

**Longest-Running Shows (by seasons):**
1. Later... with Jools Holland - 63 seasons
2. Days of Our Lives - 56 seasons
3. Question Time - 54 seasons
4. Emmerdale - 51 seasons
5. Saturday Night Live - 50 seasons

**Most Episodes Produced:**
1. Jeopardy! - 9,090 episodes
2. Emmerdale - 8,110 episodes
3. Wheel of Fortune - 7,967 episodes
4. EastEnders - 6,850 episodes
5. Goede Tijden, Slechte Tijden - 6,375 episodes

**Key Insights**:
- Game shows and soap operas dominate longevity metrics
- Daily/weekly formats enable massive episode counts
- Talk shows maintain consistent multi-decade runs

![Longevity Analysis](docs/images/longevity_analysis.png)

---

### 3.6 Runtime Analysis

**Runtime Statistics:**
- Average Runtime: **49.76 minutes**
- Median Runtime: **60 minutes**

**Most Common Runtimes:**
1. 60 minutes - 1,197 shows (47.5%)
2. 30 minutes - 763 shows (30.3%)
3. 25 minutes - 58 shows

**Average Runtime by Content Type:**
- Sports: **119 minutes**
- Documentary: **57 minutes**
- Reality: **52 minutes**
- Scripted: **51 minutes**
- Animation: **26 minutes**

**Key Insights**:
- Industry standard: 30-minute sitcoms and 60-minute dramas
- Animation consistently shorter (25-30 min) for audience attention
- Premium content (HBO, Showtime) often exceeds standard runtimes

---

### 3.7 Language & International Content

**Language Distribution:**
- English: 2,346 shows (93.1%)
- Japanese: 85 shows (3.4%)
- Dutch: 24 shows (1.0%)
- Spanish: 21 shows (0.8%)
- Other: 43 shows (1.7%)

**Average Rating by Language** (minimum 5 shows):
- Japanese: **7.42**
- English: **7.33**

**Key Insights**:
- Dataset heavily English-language focused (93%)
- Japanese anime content performs exceptionally well (higher than English average)
- International content represents growth opportunity

---

### 3.8 Correlation Analysis

**Key Correlations:**

**Seasons vs Episodes:** +0.665 (Strong positive)
- Expected: More seasons = more episodes

**Rating vs Seasons:** +0.166 (Weak positive)
- Slight tendency: Higher-rated shows run longer

**Rating vs Episodes:** +0.098 (Very weak positive)
- Episode count not strongly tied to quality

**Runtime vs Rating:** +0.089 (Very weak positive)
- Runtime has minimal impact on ratings

**Key Insight**: Quality (rating) is relatively independent of show length, runtime, or episode count. Content quality matters more than quantity.

![Correlation Heatmap](docs/images/correlation_heatmap.png)

---

## 4. Key Insights & Recommendations

### Major Findings:

#### 1. Content Quality Patterns
- **Premium networks win on quality**: HBO, Showtime, BBC consistently rate 7.7+
- **Scripted drama dominates**: Best combination of volume and ratings
- **Reality TV underperforms**: Consistently lowest ratings (6.4 avg)

#### 2. Success Factors
- **Genre matters**: Drama and Action genres perform best
- **Network positioning**: Cable/premium vs broadcast shows clear quality divide
- **Content type**: Scripted content outperforms reality by 17%

#### 3. Industry Trends
- **Longevity ≠ Quality**: Game shows have massive episode counts but average ratings
- **International content opportunity**: Japanese anime shows high performance
- **Standard formats persist**: 30-min and 60-min runtimes dominate

### Recommendations:

#### For Content Creators:
1. **Focus on scripted drama**: Highest ROI for quality perception
2. **Consider premium distribution**: Premium networks enable higher budgets and creative freedom
3. **Quality over quantity**: Rating independence from episode count suggests focus on individual episode quality

#### For Network Executives:
1. **Invest in fewer, higher-quality shows**: Showtime model (27 shows, 7.86 avg) vs ABC (204 shows, 7.24 avg)
2. **Expand international content**: Japanese anime and European shows underrepresented but perform well
3. **Reconsider reality TV strategy**: Lowest-performing genre may need format innovation

#### For Viewers/Analysts:
1. **Use genre as filter**: Drama/Action genres most reliable for quality
2. **Network as quality signal**: BBC, HBO, Showtime consistently deliver
3. **Don't judge by length**: Show longevity not correlated with quality

---

## 5. Methodology & Technical Skills Demonstrated

### Tools & Technologies:
- **Python 3.13**: Primary programming language
- **pandas**: Data manipulation and analysis
- **NumPy**: Numerical computing
- **Matplotlib & Seaborn**: Data visualization
- **Jupyter Notebook**: Documentation and presentation

### Data Analytics Skills:
1. **Data Cleaning**:
   - Duplicate removal
   - Missing value analysis
   - Data type transformation

2. **Exploratory Data Analysis**:
   - Descriptive statistics
   - Distribution analysis
   - Correlation analysis
   - Groupby aggregations

3. **Data Visualization**:
   - Histograms and distributions
   - Bar charts and horizontal bars
   - Pie charts
   - Correlation heatmaps
   - Multi-subplot layouts

4. **Business Intelligence**:
   - KPI identification
   - Insight extraction
   - Actionable recommendations
   - Stakeholder communication

### Project Structure:
```
tv-shows-analysis-portfolio/
├── docs/
│   ├── images/                        # Generated charts (6 files)
│   ├── index.html                     # Project documentation
│   └── style.css                      # Styling
├── clean_tv_shows.py                  # Data cleaning script
├── generate_charts.py                 # Visualization script
├── TV_Shows_Analysis_Portfolio.ipynb  # This notebook
├── README.md                          # Project overview
├── LICENSE                            # MIT License
└── requirements.txt                   # Dependencies
```

---

## 6. Limitations & Future Work

### Current Limitations:
1. **Missing Values**: 774 shows (30.7%) lack ratings
2. **Temporal Bias**: Dataset doesn't normalize for era (older vs newer shows)
3. **Geographic Bias**: 93% English-language content limits global insights
4. **Rating Source**: Single rating source may not represent all viewer segments

### Future Enhancements:
1. **Time Series Analysis**: Analyze rating trends over premiere years
2. **Cast/Crew Impact**: Correlate actors/directors with show success
3. **Genre Combinations**: Analyze multi-genre shows (Drama-Crime vs Drama-Comedy)
4. **Network Evolution**: Track how network quality changed over decades
5. **Predictive Modeling**: Build ML model to predict show success based on features
6. **Sentiment Analysis**: Analyze show summaries for themes and patterns
7. **Schedule Analysis**: Examine optimal air days/times for different content types

---

## 7. Conclusion

This comprehensive analysis of 2,519 television shows reveals clear patterns in content quality and success factors:

### Key Takeaways:

1. **Quality Beats Quantity**: Premium networks produce fewer shows but maintain significantly higher ratings

2. **Genre Matters**: Drama and action genres consistently deliver the highest-rated content

3. **Content Type Divide**: Scripted content substantially outperforms reality TV (7.48 vs 6.39)

4. **International Opportunity**: Non-English content, particularly Japanese anime, demonstrates strong performance despite limited representation

5. **Longevity Independence**: Show success (rating) operates independently from length (seasons/episodes), emphasizing episode quality over series quantity

### Portfolio Value:

This project demonstrates end-to-end data analytics capabilities:
- Data acquisition and cleaning
- Statistical analysis and pattern recognition
- Professional data visualization
- Business insight extraction
- Clear communication of findings

The insights generated can guide content strategy decisions for networks, inform viewer choices, and provide benchmark data for industry analysis.

---

**Thank you for reviewing this portfolio project!**

*For questions or collaboration opportunities, please connect with me.*

---