<img src="data/images/lecture-notebook-header.png" />

# Preparation: Setup Check

The purpose of this notebook is to provide some checks if the most important packages needed for this course are installed. If none of the code cells below throw an error, you should be ready to go for the course. In case you get an error stating that a package is missing, you may want to install it using `pip`, `conda`, or `mamba` -- depending on your Python installation and package manager.

Note that individual notebooks might require the import of additional packages. Here, we cover only the most common and frequently used ones.

## Python Installation

We recommend using Python 3.9 to ensure that all required packages are fully supported; although Python 3.10 should also cause no problems. You can check the in-built `sys` module to see which Python version you are using.

In [None]:
import sys
print(sys.version)

## Core Data Science Packages

### NumPy

NumPy is a widely used Python package for numerical computing. It stands for "Numerical Python" and provides efficient and high-performance operations on large, multi-dimensional arrays and matrices. NumPy is a fundamental library in the Python scientific computing ecosystem and serves as a building block for many other data science and machine learning libraries.

In [None]:
import numpy

### Pandas

Pandas is a powerful Python library for data manipulation and analysis. It provides high-performance, easy-to-use data structures and data analysis tools, making it a popular choice for working with structured and tabular data. Pandas builds on top of NumPy and extends its functionality with additional features tailored for data analysis tasks. Pandas is widely used in data analysis, data exploration, data cleaning, feature engineering, and other data-related tasks. It offers a comprehensive set of tools that make working with structured data in Python more convenient and efficient.

In [None]:
import pandas

### Scikit-Learn

The "sklearn" abbreviation commonly refers to the popular Python library called scikit-learn. Scikit-learn is a powerful machine learning library that provides a wide range of algorithms and tools for data mining, data analysis, and machine learning tasks. It is built on top of other scientific Python libraries such as NumPy, SciPy, and matplotlib. Scikit-learn is widely used for various machine learning tasks, including classification, regression, clustering, and dimensionality reduction. It provides a user-friendly interface and a rich set of functionalities, making it a valuable tool for both beginners and experienced practitioners in the field of machine learning.

In [None]:
import sklearn

### Matplotlib & Seaborn

Matplotlib is a popular Python library for creating static, animated, and interactive visualizations. It provides a comprehensive set of tools for generating plots, charts, histograms, scatter plots, and many other types of visualizations. Matplotlib is widely used in data analysis, scientific research, and data visualization tasks. Matplotlib is a powerful and versatile library for data visualization in Python. It offers a wide range of plotting options, customization features, and integration with other scientific libraries. Whether you need to create simple visualizations or complex, interactive plots, Matplotlib provides the tools and flexibility to meet your data visualization needs.

Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for creating visually appealing and informative statistical graphics. Seaborn is built on top of Matplotlib and enhances its functionalities, making it easier to generate complex visualizations with fewer lines of code. Seaborn is particularly useful for exploring and visualizing relationships in datasets, especially in the context of statistical analysis. Seaborn is a powerful library for creating visually appealing and informative statistical visualizations. It simplifies the process of generating complex plots and provides specialized functions for exploring relationships in data. Whether you are working on exploratory data analysis, statistical modeling, or data communication, Seaborn can enhance your visualization workflow and help you gain insights from your data.

In [None]:
import matplotlib
import seaborn

### NetworkX

The `networkx` package is a Python library used for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It provides a powerful toolset for working with network data, including graphs, nodes, edges, and various algorithms for analyzing and visualizing networks. `networkx` is widely used in diverse fields, including social network analysis, bioinformatics, recommendation systems, and transportation networks. `networkx` provides a comprehensive set of tools and algorithms for working with networks. It allows for the creation, manipulation, analysis, and visualization of complex networks, making it a valuable tool for studying network structures and properties. Whether you are exploring social networks, analyzing biological networks, or modeling interconnected systems, `networkx` provides a flexible and powerful framework for network analysis in Python.

In [None]:
import networkx

## Additional Packages

### Efficient-Apriori

`efficient-apriori` is a Python library that implements the Apriori algorithm, a popular algorithm for frequent itemset mining and association rule learning. The library provides a fast and memory-efficient implementation of the algorithm, making it suitable for analyzing large datasets. It is designed to find frequent itemsets in a transactional dataset, where each transaction is a set of items. It calculates the support of each itemset, which represents the frequency of occurrence of that itemset in the dataset. The library then generates association rules based on the frequent itemsets, allowing you to discover relationships between items.

In [None]:
import efficient_apriori