# Data Science Tools and Ecosystem

## Introduction

In this notebook, we will explore the comprehensive ecosystem of data science tools, languages, and libraries. This final assignment summarizes the key components that data scientists use in their daily work, including programming languages, essential libraries, development environments, and practical examples of data manipulation and analysis.

## Data Science Languages

Some of the popular languages that Data Scientists use are:

1. **Python** - Most popular for data science due to its simplicity and extensive libraries
2. **R** - Specifically designed for statistical computing and graphics
3. **SQL** - Essential for database querying and data extraction
4. **Scala** - Used with Apache Spark for big data processing
5. **Java** - Useful for big data tools and enterprise applications
6. **Julia** - High-performance language for numerical analysis
7. **JavaScript** - Used for data visualization and web-based analytics
8. **C++** - Used for performance-critical applications
9. **Go** - Emerging language for data engineering
10. **SAS** - Traditional statistical software language

## Data Science Libraries

Some of the commonly used libraries used by Data Scientists include:

### Python Libraries:
1. **NumPy** - Numerical computing with arrays
2. **Pandas** - Data manipulation and analysis
3. **Matplotlib** - Data visualization and plotting
4. **Seaborn** - Statistical data visualization
5. **Scikit-learn** - Machine learning algorithms
6. **TensorFlow** - Deep learning framework
7. **PyTorch** - Deep learning and neural networks
8. **Keras** - High-level neural networks API
9. **SciPy** - Scientific computing
10. **Plotly** - Interactive data visualization

### R Libraries:
1. **ggplot2** - Data visualization
2. **dplyr** - Data manipulation
3. **caret** - Classification and regression training
4. **randomForest** - Random forest algorithm
5. **shiny** - Web applications for R

## Data Science Tools

| Tool | Category | Description |
|------|----------|-------------|
| **Jupyter Notebook** | Development Environment | Interactive computing environment |
| **RStudio** | Development Environment | IDE for R programming |
| **Apache Spark** | Big Data Processing | Unified analytics engine |
| **Tableau** | Data Visualization | Business intelligence platform |
| **Power BI** | Data Visualization | Microsoft's business analytics tool |
| **Apache Kafka** | Data Streaming | Distributed streaming platform |
| **Docker** | Containerization | Platform for developing applications |
| **Git** | Version Control | Distributed version control system |
| **MySQL** | Database | Relational database management system |
| **MongoDB** | Database | NoSQL document database |
| **Hadoop** | Big Data Framework | Distributed storage and processing |
| **Airflow** | Workflow Management | Platform for workflow orchestration |

### Below are a few examples of evaluating arithmetic expressions in Python

Arithmetic expressions are fundamental in data science for performing calculations, statistical analysis, and data transformations. Python provides standard mathematical operators that allow us to perform various calculations efficiently.

In [1]:
# This is a simple arithmetic expression to multiply then add integers
result = (3 * 4) + 5
print(f"(3 * 4) + 5 = {result}")

(3 * 4) + 5 = 17


In [2]:
# This will convert 200 minutes to hours by dividing by 60
minutes = 200
hours = minutes / 60
print(f"{minutes} minutes is equal to {hours} hours")

200 minutes is equal to 3.3333333333333335 hours


## Objectives

This notebook covers the following key learning objectives:

- List popular languages for Data Science
- Identify commonly used libraries in Data Science
- Understand various Data Science tools and development environments
- Create and execute basic arithmetic expressions in Python
- Perform practical data manipulation tasks like unit conversions
- Demonstrate proficiency in using Jupyter Notebook for data science workflows

## Author

**Name:** Yash Dogra  
**Course:** Data Science Tools and Ecosystem  
**Date:** September 2025

> Replace the placeholder above with your real name before sharing the GitHub link.

## Sharing the Notebook (Exercise 12)

To complete Exercise 12, upload this notebook (`DataScienceEcosystem.ipynb`) to a **public GitHub repository** and provide the link.

**Steps to Share:**
1. Create (or open) a GitHub repository (public).
2. Click "Add file" → "Upload files" and select this notebook.
3. Commit the changes.
4. Open the notebook in the repo and copy the URL (it should look like: `https://github.com/<username>/<repo>/blob/main/DataScienceEcosystem.ipynb`).
5. (Optional) Also click the "Raw" button and ensure it loads—this confirms public access.

**Paste your public GitHub link here:**
`<ADD LINK HERE>`

**Verification Checklist:**
- Repository is public ✅
- Notebook renders on GitHub ✅
- All cells are visible ✅
- Code cells executed (outputs present) ✅

## Screenshot Instructions (Exercise 13)

Take a screenshot of the **first page** of this notebook showing:
- The title
- Introduction
- Languages section (at least partially)
- And the top of the next section if possible

**How to Capture (macOS):** Press `Shift + Command + 4` then drag to select the visible area.

**Best Practices:**
- Make sure outputs from code cells (Exercises 8 and 9) are executed before capturing.
- Use `.png` or `.jpg` format.
- Ensure text is readable (avoid zooming out too far).

After capturing, you'll upload the screenshot per the course submission instructions.