<img src="_img/cover.png" alt="Data Analysis" height="256px" align="right">

by Vimalkumar Narasimman | n.vimalkumar@hotmail.com


## What You Will Learn

*Prerequisite: If you don't have basic knowledge of Python or past
experience with another language (R, SAS, MATLAB, etc.), consult the
[`ch_01/python_101.ipynb`](./ch_01/python_101.ipynb) Jupyter notebook
for a Python crash-course/refresher.*

-   Understand how data analysts and scientists gather and analyze data
-   Perform data analysis and data wrangling in Python
-   Combine, group, and aggregate data from multiple sources
-   Create data visualizations with `pandas`, `matplotlib`, and
    `seaborn`
-   Apply machine learning algorithms with `sklearn` to identify
    patterns and make predictions
-   Use Python data science libraries to analyze real-world datasets.
-   Use `pandas` to solve several common data representation and
    analysis problems
-   Collect data from APIs
-   Build Python scripts, modules, and packages for reusable analysis
    code.
-   Utilize computer science concepts and algorithms to write more
    efficient code for data analysis
-   Write and run simulations

## Table of Contents

-   [Chapter 1, *Introduction to Data Analysis*](./ch_01), will teach
    you the fundamentals of data analysis, give you a foundation in
    statistics, and get your environment set up for working with data in
    Python and using Jupyter Notebooks.

-   [Chapter 2, *Working with Pandas DataFrames*](./ch_02), introduces
    you to the `pandas` library and shows you the basics of working with
    `DataFrames`.

-   [Chapter 3, *Data Wrangling with Pandas*](./ch_03), discusses the
    process of data manipulation, shows you how to explore an API to
    gather data, and guides you through data cleaning and reshaping with
    pandas.

-   [Chapter 4, *Aggregating Pandas DataFrames*](./ch_04), teaches you
    how to query and merge DataFrames, perform complex operations on
    them, including rolling calculations and aggregations, and how to
    work effectively with time series data.

-   [Chapter 5, *Visualizing Data with Pandas and Matplotlib*](./ch_05),
    shows you how to create your own data visualizations in Python,
    first using the `matplotlib` library, and then directly from
    `pandas` objects.

-   [Chapter 6, *Plotting with Seaborn and Customization
    Techniques*](./ch_06), continues the discussion on data
    visualization by teaching you how to use the `seaborn` library for
    visualizing your long form data and giving you the tools you need to
    customize your visualizations, making them presentation-ready.

-   [Chapter 7, *Financial Analysis: Bitcoin and the Stock
    Market*](./ch_07), walks you through the creation of a [Python
    package for analyzing
    stocks](https://github.com/stefmolin/stock-analysis), building upon
    everything learned in chapters 1-6 and applying it to a financial
    application.

-   [Chapter 8, *Rule-Based Anomaly Detection*](./ch_08), covers
    [simulating
    data](https://github.com/stefmolin/login-attempt-simulator) and
    applying everything learned in chapters 1-6 to catching hackers
    attempting to authenticate to a website, using rule-based strategies
    for anomaly detection.

-   [Chapter 9, *Getting Started with Machine Learning in
    Python*](./ch_09), introduces you to machine learning and building
    models using the `sklearn` library.

-   [Chapter 10, *Making Better Predictions: Optimizing
    Models*](./ch_10), shows you strategies for improving the
    performance of your machine learning models.

-   [Chapter 11, *Machine Learning Anomaly Detection*](./ch_11),
    revisits anomaly detection on login attempt data, using machine
    learning techniques, all while giving you a taste of how the
    workflow looks in practice.

-   [Chapter 12, *The Road Ahead*](./ch_12), contains resources for
    taking your skills to the next level and further avenues for
    exploration.

## Notes on Environment Setup

Environment setup instructions are in the chapter 1 of the text. If you
don't have the book, you will need to install Python \>= 3.7 and \<
3.10, [set up a virtual
environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment),
[activate
it](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/#activating-a-virtual-environment),
[fork](https://docs.github.com/en/get-started/quickstart/fork-a-repo)
and
[clone](https://docs.github.com/en/get-started/quickstart/fork-a-repo#cloning-your-forked-repository)
this repository to obtain a local copy of the files, [change the current
directory](https://alligator.io/workflow/command-line-basics-changing-directories/)
to your local copy of the files, and then [install the required packages
using the requirements.txt
file](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/#using-requirements-files)
inside the directory (use `pip install -r requirements.txt`). You
can then launch JupyterLab and use the `ch_01/checking_your_setup.ipynb`
Jupyter notebook to check your setup.
