# **CH 01: INTRODUCTION TO DATA SCIENCE**

---

## **1. What is data science?**

**`Data science is the field of study that uses different methods to collect and analyze large amounts of data to find hidden insights. This information can then be used to make informed decisions and solve real-world problems.`**

##### _**(a) What can data do?**_

Data can:

* **Describe:** Show the current state of an organization or process using dashboards and reports.

* **Detect:** Find unusual events, such as fraudulent purchases or suspicious activity.

* **Diagnose:** Identify the causes of events and behaviors, such as why customers are leaving.

* **Predict:** Forecast future events, such as how likely a customer is to buy a product.



##### _**(b) The Data Science Workflow**_


Step | Expainition | Example
---------|----------|---------
 **Data Collection and Storage** | In this step, data scientists collect data from a variety of sources, such as databases, surveys, and social media. The data can be structured (e.g., a spreadsheet of customer data) or unstructured (e.g., text reviews of products). | A data scientist at a retail company might collect data on customer purchases, product ratings, and website traffic.
**Data Preparation & Cleaning** | Once the data has been collected, it needs to be prepared for analysis. This may involve cleaning the data to remove errors and inconsistencies, transforming the data into a format that is compatible with the chosen analysis tools, and merging data from multiple sources. | The data scientist might remove any duplicate purchase records, convert all currency values to USD, and merge the customer purchase data with the product rating data.
 **Data Exploration & Visualization** | In this step, data scientists use a variety of tools to explore the data and identify patterns and trends. They may also create visualizations, such as charts and graphs, to communicate their findings to others. | The data scientist might use a data visualization tool to create a bar chart showing the top-selling products by category.
**Experimentation & Prediction** | In the final step, data scientists build and evaluate predictive models. These models can be used to forecast future events, such as which customers are most likely to churn or which products are most likely to sell well. | The data scientist might build a machine learning model to predict which customers are likely to churn within the next month. This model could then be used to target these customers with retention campaigns.

>Note: The data science workflow is not always linear. Data scientists may need to go back and forth between steps as they learn more about the data and the problem they are trying to solve

---

## **2. Applications of Data Science**

Following are the 3 exciting areas of data science:

1. Traditional Machine Learning
2. Internet of Things (IoT)
3. Deep Learning

Areas of Data Science | Explainition | Example
---------|----------|---------
 **Traditional Machine Learning** | Traditional machine learning (TML) is a type of artificial intelligence (AI) that uses algorithms to learn from data and make predictions. TML algorithms are typically trained on labeled data, where the input data is paired with the desired output. | A TML algorithm could be trained on a dataset of images of cats and dogs, labeled with the correct animal in each image. Once the algorithm is trained, it could be used to predict whether a new image contains a cat or a dog.
 **Internet of Things** | Internet of Things (IoT) is a network of physical devices that are connected to the internet and can collect and exchange data. IoT devices are used in a wide range of industries, including healthcare, manufacturing, and transportation. | An IoT smart thermostat can collect data on the temperature in a room and adjust the thermostat accordingly.
 **Deep Learning** | Deep learning (DL) is a type of machine learning that uses artificial neural networks to learn from data. DL algorithms are able to learn complex patterns that are difficult for traditional machine learning algorithms to learn. | A DL algorithm could be used to develop a system that can automatically identify cancer cells in medical images.

#### _**(a) Difference between Machine Learning and Deep Learning:**_

**Imagine you are trying to teach a computer to recognize different types of flowers**


Machine Learning | Deep Learning 
---------|----------
 With TML, you would need to create a dataset of images of flowers, labeled with the type of flower in each image.| With DL, you would not need to label the images of flowers.  
 You would then train a TML algorithm on this dataset. | Instead, you would train a DL algorithm on a dataset of unlabeled images of flowers. 
 Once the algorithm is trained, you could use it to predict the type of flower in a new image. | The DL algorithm would learn to identify different types of flowers by automatically extracting features from the images. 

---

#### _**(b) Roles in Data Science Industry**_

1. Data Engineer
2. Data Analyst
3. Data Scientist
4. Machine Learning Engineer

**Data engineer**

* Builds and maintains the infrastructure for data collection, storage, and processing.

* Ensures that data is accessible and secure.

* Develops and automates data pipelines.

* **Data Science Workflow:** Work in Data Collection and Storage

**Data scientist**

* Develops and applies machine learning algorithms to solve complex problems.

* Uses data to identify patterns and trends.

* Builds and evaluates predictive models.

* Communicates insights to stakeholders.

* **Data Science Workflow:** Work in Data Preparation, Exploration and Visualization, Experimentation and Prediction

**Machine learning engineer**

* Builds and deploys machine learning models to production environments.

* Develops and maintains machine learning pipelines.

* Optimizes machine learning models for performance and scalability.

* **Data Science Workflow:** Work in Experimentation and Prediction

**Data analyst**

* Collects, cleans, and analyzes data to identify patterns and trends.

* Communicates insights to stakeholders using data visualizations and reports.

* Supports data scientists and machine learning engineers in their work.

* **Data Science Workflow:** Work in Data Preparation, Exploration and Visualization