# SDS Project: Exploratory Data Analysis

This Python Jupyter notebook explores the differences in polarity and subjectivity between various news sources. By plotting subjectivity against polarity, and color-coding the data points based on the news source, we can visually analyze the potential biases and perspectives inherent in different media outlets. This approach not only provides a quantitative measure of news objectivity but also offers a unique lens through which we can better understand the landscape of contemporary news reporting. 

First, let's load in the processed data that we collected from News Sentiment Analysis

In [None]:
#Basic imports for the assignment
import numpy as np
import pandas as pd

In [None]:
business_df = pd.read_csv("business_news_processed.csv")
us_news_df = pd.read_csv("us_news_processed.csv")
tech_df = pd.read_csv("tech_news_processed.csv")
world_df = pd.read_csv("world_news_processed.csv")
commodities1_df = pd.read_csv("commodities1_processed.csv")
commodities2_df = pd.read_csv("commodities2_processed.csv")

### Scatter Plot - Polarity vs. Subjectivity of News Sources

In [None]:
import altair as alt
from vega_datasets import data

# Increase the row limit
alt.data_transformers.disable_max_rows()

# Business dataframe
business_sample = business_df.sample(frac=0.1)
scatter_business = alt.Chart(business_sample).mark_circle(size=60).encode(
    x='Content Subjectivity',
    y='Content Polarity',
    tooltip=['Content Subjectivity', 'Content Polarity']
).interactive().properties(
    title='Business Scatterplot: Content Subjectivity vs Content Polarity'
)
scatter_business.display()

# US News dataframe
us_news_sample = us_news_df.sample(frac=0.1)
scatter_us_news = alt.Chart(us_news_sample).mark_circle(size=60).encode(
    x='Content Subjectivity',
    y='Content Polarity',
    tooltip=['Content Subjectivity', 'Content Polarity']
).interactive().properties(
    title='US News Scatterplot: Content Subjectivity vs Content Polarity'
)
scatter_us_news.display()

# Tech dataframe
tech_sample = tech_df.sample(frac=0.1)
scatter_tech = alt.Chart(tech_sample).mark_circle(size=60).encode(
    x='Content Subjectivity',
    y='Content Polarity',
    tooltip=['Content Subjectivity', 'Content Polarity']
).interactive().properties(
    title='Tech Scatterplot: Content Subjectivity vs Content Polarity'
)
scatter_tech.display()

# World dataframe
world_sample = world_df.sample(frac=0.1)
scatter_world = alt.Chart(world_sample).mark_circle(size=60).encode(
    x='Content Subjectivity',
    y='Content Polarity',
    tooltip=['Content Subjectivity', 'Content Polarity']
).interactive().properties(
    title='World Scatterplot: Content Subjectivity vs Content Polarity'
)
scatter_world.display()

# Commodities 1 dataframe
commodities1_sample = commodities1_df.sample(frac=0.1)
scatter_commodities1 = alt.Chart(commodities1_sample).mark_circle(size=60).encode(
    x='Content Subjectivity',
    y='Content Polarity',
    tooltip=['Content Subjectivity', 'Content Polarity']
).interactive().properties(
    title='Commodities 1 Scatterplot: Content Subjectivity vs Content Polarity'
)
scatter_commodities1.display()

# Commodities 2 dataframe
commodities2_sample = commodities2_df.sample(frac=0.1)
scatter_commodities2 = alt.Chart(commodities2_df).mark_circle(size=60).encode(
    x='Content Subjectivity',
    y='Content Polarity',
    tooltip=['Content Subjectivity', 'Content Polarity']
).interactive().properties(
    title='Commodities 2 Scatterplot: Content Subjectivity vs Content Polarity'
)
scatter_commodities2.display()
