# Data Science Assignment

## It is the final assignment of the data science course and will be graded by the peers.

### Data Science Languages

- **Python**: Widely used for data manipulation, analysis, and machine learning through libraries like Pandas, NumPy, SciPy, and scikit-learn.
- **R**: Primarily used for statistical analysis, visualization, and machine learning with packages like ggplot2, dplyr, and caret.
- **SQL**: Essential for managing and querying relational databases to extract and manipulate data.
- **Julia**: Known for its speed and performance in scientific computing and data analysis.
- **Scala**: Used with Apache Spark for distributed computing and processing large-scale datasets.
- **MATLAB**: Popular in academia and industry for numerical computing, data analysis, and visualization.
- **JavaScript (with libraries like D3.js)**: Used for interactive data visualization in web applications.
- **SAS**: Historically used in statistical analysis and data management in industries like healthcare and finance.
- **Java**: Employed in big data processing frameworks like Hadoop and for developing enterprise-level applications in data science.


### Data Science Libraries

#### Python
- **Pandas**: For data manipulation and analysis, offering data structures and operations.
- **NumPy**: Fundamental package for scientific computing, providing support for arrays and matrices.
- **SciPy**: Complementary to NumPy, offering additional scientific computing tools.
- **scikit-learn**: Machine learning library providing simple and efficient tools for data mining and analysis.
- **TensorFlow / Keras**: For building and training machine learning models, especially neural networks.
- **PyTorch**: Deep learning library known for its flexibility and ease of use.
- **Matplotlib**: Comprehensive library for creating static, interactive, and animated visualizations in Python.
- **Seaborn**: Data visualization library based on Matplotlib, offering a high-level interface for statistical graphics.

#### R
- **ggplot2**: R's famous data visualization package known for its grammar of graphics.
- **dplyr**: For data manipulation tasks with a focus on clarity and ease of understanding.
- **caret**: Unified interface to multiple machine learning algorithms in R.

#### Others
- **SQLAlchemy**: SQL toolkit and Object-Relational Mapping (ORM) for Python, enabling interaction with SQL databases.
- **Apache Spark**: Provides APIs for data manipulation, querying, and analysis at scale.
- **Hadoop**: Framework for distributed storage and processing of large datasets.

### Data Science Tools

| Tool             | Description                                                                               | Language    |
|------------------|-------------------------------------------------------------------------------------------|-------------|
| **Jupyter Notebook** | Interactive development environment allowing creation and sharing of documents combining live code, equations, visualizations, and narrative text. | Python, R, Julia, others |
| **RStudio**          | Integrated development environment (IDE) specifically designed for R programming.         | R           |
| **Spyder**           | Python IDE designed for scientific computing, data analysis, and visualization.           | Python      |
| **Visual Studio Code** | Highly customizable code editor with extensions supporting various data science tasks.   | Multiple    |
| **Tableau**          | Data visualization tool allowing the creation of interactive and shareable dashboards.    | -           |
| **Power BI**         | Business analytics tool for creating interactive visualizations and business intelligence. | -         |
| **KNIME**            | Open-source data analytics, reporting, and integration platform.                           | Java        |
| **RapidMiner**       | ETL (Extract, Transform, Load) and data mining software.                                   | Java        |
| **SAS**              | Suite of analytics solutions used for statistical analysis and predictive modeling.        | -           |
| **Apache Hadoop**    | Framework for distributed storage and processing of large datasets.                         | Java        |
| **Apache Spark**     | Fast cluster computing system for big data processing.                                      | Scala, Java |
| **Databricks**       | Unified analytics platform for big data and machine learning.                               | Scala, Python, R |
| **Alteryx**          | Data blending and advanced analytics platform.                                              | -           |
| **IBM Watson Studio** | Integrated environment to prepare, analyze, visualize, and model data.                      | Python, R, Scala |
| **Google Colab**     | Cloud-based Jupyter notebook environment provided by Google.                                | Python      |
| **Azure Machine Learning** | Cloud-based machine learning service by Microsoft.                                        | Python, R   |
| **AWS SageMaker**    | Managed service by Amazon for building, training, and deploying machine learning models.    | Python      |


### Arithmetic Expression Examples

Arithmetic expressions involve mathematical operations such as addition, subtraction, multiplication, division, and more. They are fundamental to mathematical computations and are extensively used in programming and data science.

#### Basic Arithmetic Operators

- **Addition (+)**: Combining two or more numbers. For instance, `3 + 5 = 8`.
- **Subtraction (-)**: Finding the difference between two numbers. For example, `10 - 4 = 6`.
- **Multiplication (*)**: Performing repeated addition of a number. For instance, `4 * 3 = 12`.
- **Division (/)**: Splitting a number into equal parts. For example, `20 / 5 = 4`.

#### Additional Operators

- **Exponentiation (`**` or `^` in some languages)**: Raising a number to the power of another. For instance, `2 ** 3 = 8` (2 raised to the power of 3).
- **Modulo (%)**: Finding the remainder of a division operation. For example, `17 % 5 = 2` (17 divided by 5 leaves a remainder of 2).

#### Order of Operations (PEMDAS/BODMAS)

Arithmetic expressions follow an order of operations:
1. **Parentheses (Brackets)**: Perform operations within parentheses first.
2. **Exponents (Indices)**: Evaluate exponentiation.
3. **Multiplication and Division (from left to right)**: Perform these operations next.
4. **Addition and Subtraction (from left to right)**: Lastly, compute addition and subtraction operations.

In [1]:
# Define numbers
num1 = 10
num2 = 5
num3 = 3

# Perform multiplication and addition
result_multiplication = num1 * num2 * num3  # Multiply three numbers
result_addition = num1 + num2 + num3       # Add three numbers

# Display results
print(f"Multiplication result: {result_multiplication}")
print(f"Addition result: {result_addition}")


Multiplication result: 150
Addition result: 18


In [2]:
# Define minutes
minutes = 150

# Convert minutes to hours
hours = minutes / 60

# Display the result
print(f"{minutes} minutes is equal to {hours} hours")


150 minutes is equal to 2.5 hours


### Objectives of Data Science

1. **Extract Insights**: Analyze and extract meaningful insights and patterns from large and complex datasets to aid in decision-making processes.

2. **Predictive Analysis**: Develop models and algorithms to predict future trends, behaviors, or outcomes based on historical data, enabling proactive actions.

3. **Data-driven Decision Making**: Empower organizations to make informed decisions supported by data, reducing uncertainty and improving efficiency.

4. **Optimization**: Optimize processes, systems, and strategies by leveraging data analysis to enhance performance and achieve better outcomes.

5. **Identify Patterns and Trends**: Discover hidden patterns, correlations, and trends within data that might not be evident at first glance.

6. **Machine Learning Implementation**: Utilize machine learning algorithms to automate tasks, classify data, and generate predictive models.

## Author: Azhan Aslam
