<a href="https://colab.research.google.com/github/comparativechrono/Principles-of-Data-Science/blob/main/intro_chapter/Section_7_Tools_and_Software_Setup__Python%2C_Numpy%2C_Pandas%2C_Scikit_learn%2C_and_Matplotlib.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Section 7 Tools and Software Setup

**Please** note the following is a guide for a fresh set-up. If you are running this notebook in Colab, please skip straight to **Installing Libraries**

For students working through the "Principles of Data Science," understanding and setting up the right tools and software is paramount for reproducibility. This section provides a detailed guide on setting up Python and essential libraries like Numpy, Pandas, Scikit-learn, and Matplotlib—each a cornerstone in the toolbox of a data scientist. This setup will enable you, the reader, to efficiently tackle the practical exercises and projects throughout the book.

## Python Installation:

Python is the primary language used in this book due to its versatility and the vast array of libraries it supports for data science. The latest version of Python can be downloaded from the official Python Software Foundation website at python.org. We recommend installing Python 3.x, as it includes significant improvements and optimizations over Python 2.x, which is no longer supported.

## Anaconda Distribution:

To simplify the installation process of Python and its libraries, it is advisable to use the Anaconda distribution. Anaconda is a free, open-source distribution of Python (and R) that aims to simplify package management and deployment. It includes the versions of Python required and most of the libraries that this book utilizes. Anaconda can be downloaded from anaconda.com.

After installing Anaconda, you can create a virtual environment specifically for the projects in this book. This is a best practice that helps manage dependencies and avoid conflicts between projects. You can create a new environment using the following command in the Anaconda Prompt or your terminal:

In [None]:
conda create --name datascience python=3.x

Replace 3.x with the latest version of Python 3 that you wish to use. Once the environment is created, activate it with:

In [None]:
conda activate datascience

## Installing Libraries:

With the environment activated, install the primary libraries used in data science projects:

Numpy: Essential for numerical operations.

Pandas: Provides high-performance, easy-to-use data structures and data analysis tools.

Scikit-learn: Offers simple and efficient tools for predictive data analysis.

Matplotlib: A plotting library for creating static, interactive, and animated visualizations in Python.

To install these libraries, use the following command:

In [None]:
conda install numpy pandas scikit-learn matplotlib

Anaconda typically includes these libraries by default, but the command ensures you have the latest versions.

## Integrated Development Environment (IDE):

While you can use any text editor to write Python code, an Integrated Development Environment (IDE) or a code notebook can enhance your coding experience with features like code completion, syntax highlighting, and direct execution. Popular choices include:

Jupyter Notebook: Provides a web-based interactive computing platform. Jupyter notebooks allow you to create and share documents that contain live code, equations, visualizations, and narrative text. It is ideal for this book because it makes it easy to combine instructional content with the code. To install Jupyter via Anaconda, use:

In [None]:
conda install jupyter

To run Jupyter Notebook, use the command jupyter notebook in your Anaconda Prompt or terminal.

Visual Studio Code (VS Code): A lightweight but powerful source code editor which runs on your desktop and is available for Windows, macOS, and Linux. It comes with built-in support for Python, and powerful extensions for data science such as the Python extension (which provides rich support for Python with features such as IntelliSense (code completions) and debugging).

VS Code can be downloaded from code.visualstudio.com.

## Testing the Setup:

After setting up your environment and tools, it’s wise to test whether everything works correctly. You can do this by running a simple Python script to check the versions of the installed libraries. In your Python environment or Jupyter notebook, execute the following code:

In [None]:
import numpy
import pandas
import sklearn
import matplotlib

print(f"Numpy version: {numpy.__version__}")
print(f"Pandas version: {pandas.__version__}")
print(f"Scikit-learn version: {sklearn.__version__}")
print(f"Matplotlib version: {matplotlib.__version__}")

This setup, utilising a specific environment, provides a reproducible approach for undertaking the data science projects and exercises presented in this book. Hopefully this means you can focus on learning data science concepts and techniques without unnecessary disruptions.