# Defining Data Science


> Data Science is an interdisciplinary field that leverages scientific methods, processes, algorithms, and systems to **extract knowledge and insights from data**. 

> It draws on a range of domains including statistics, machine learning, data mining, and domain-specific expertise to uncover patterns and inform decision-making in complex systems.


## Key Components of Data Science

Modern view of Data Science encompasses the following:


- Data Collection and Engineering: Involves acquiring, cleaning, transforming, and storing data. This may include web scraping, API use, and setting up pipelines.

- Exploratory Data Analysis (EDA): Using statistical summaries and visualizations to understand data distributions and detect anomalies or patterns (Tukey, 1977).

- Modeling and Algorithms: Applying machine learning or statistical models (e.g., regression, decision trees, neural networks) to make predictions or classifications.

- Evaluation and Validation: Assessing model performance using metrics such as accuracy, precision, recall, and AUC (Han et al., 2011).

- Deployment and Decision-Making: Integrating models into products or decision systems, often through APIs or dashboards.

- Communication: Using data storytelling and visualization to communicate findings effectively to stakeholders.

![image.png](attachment:image.png)

## Distinction from Related Fields


- Statistics provides theoretical foundations and inferential techniques but may not focus on computational scalability or domain deployment.

- Machine Learning emphasizes algorithmic prediction, often without concern for interpretability.

- Big Data emphasizes volume, velocity, and variety, while Data Science emphasizes insight and value extraction.

## Applications

Data Science is used across diverse sectors:


- Healthcare (e.g., predictive diagnostics),

- Finance (e.g., fraud detection),

- Marketing (e.g., customer segmentation),

- Public Policy (e.g., crime prediction),

- Natural Language Processing (e.g., sentiment analysis)

## Story: The Pregnancy Prediction Algorithm


This is a true story that highlights both the **power and ethical challenges of data science.**

<details>

<summary>Click to expand</summary>

- A few years ago, a father walked into a Target store in the United States, furious. He had discovered that his teenage daughter had been receiving coupons in the mail for baby clothes, cribs, and maternity products. He demanded to know why Target was encouraging teenage pregnancy.

- A few days later, the father called back to apologize. It turned out that his daughter was indeed pregnant—but she hadn’t told anyone yet.

</details>


- <mark>So how did Target know?</mark> 


<details>

<summary>Click to expand</summary>

- Behind the scenes, Target’s data scientists had built a pregnancy prediction model. They started by analyzing the shopping patterns of women who had signed up for baby registries. They found that buying certain items—like unscented lotion, calcium supplements, and cotton balls—tended to spike in the early stages of pregnancy. These subtle signals, when combined with machine learning, allowed Target to assign a "pregnancy prediction score" to thousands of female customers based on their purchase history.

- Once someone was flagged as likely pregnant, Target could send them personalized ads—even predicting their due date within a narrow window.
It worked. The model was accurate. But it also crossed an ethical line. The incident sparked public outrage and led Target to change how they used such insights—hiding baby-related ads among other unrelated promotions to avoid drawing attention.

- This story shows us the double-edged sword of data science: the ability to uncover hidden patterns—and the responsibility to use those insights with care.
  
</details>


## Story: How Netflix Used Data Science to Beat Blockbuster

![image.png](attachment:image.png)

<details>

<summary>Click to expand</summary>


- In the early 2000s, **Blockbuster** dominated the home movie rental industry with over 9,000 stores, while Netflix was a small mail-order DVD company. 

- Blockbuster dismissed Netflix’s model—no late fees, DVDs by mail, and an online interface—as unserious and unscalable.

</details>

- <mark>But Netflix had a secret weapon: data. <mark>

<details>


<summary>Click to expand</summary>



- But Netflix had something powerful that Blockbuster did not: **data**.

- Netflix tracked every rental, user rating, and browsing behavior. This data formed the foundation of one of the most influential recommender systems in industry history.

- In 2006, Netflix launched the **Netflix Prize—a $1 million open competition** to improve its movie recommendation algorithm. The challenge was to beat Netflix’s in-house system,by at least 10% in predicting user ratings. This attracted thousands of data scientists, researchers, and hobbyists worldwide. Over three years and with over 40,000 submissions, the winning team finally crossed the 10% improvement threshold using a blend of ensemble learning techniques.


</details>


- <mark> What happend to **Blockbuster** <mark>


<details>

<summary>Click to expand</summary>



- By 2010, **Blockbuster filed for bankruptcy**. 

- Today, Netflix has over 260 million subscribers, and its recommendation engine drives more than 80% of all streamed content.

</details>



