Python

BasicDataAnalysis is the thing I learnt in 1st year

Data-Processing is the thing I learnt in 2nd year

TOPIC 0: Recap on Jupyter Notebooks and some Python structures This section provides a recap of the core concepts you’ve already covered in your first year of Python programming.

Notebook for starters

List, double and dict structures

Slicing in Python

List comprehension

Recommended Reading List for Data Processing with Python

The following book will be the primary course material for this class:

Python for Data Analysis (https://wesmckinney.com/book/)

This book by Wes McKinney provides a comprehensive introduction to data analysis with Python, covering key topics such as data manipulation, analysis with Pandas, and using Numpy and Matplotlib for scientific computing.

To enhance your understanding of the topics covered in this course, the following additional resources are recommended. These materials will support your learning in Python programming, data analysis, and visualization techniques, as well as the Numpy, Pandas, and Matplotlib libraries. You are encouraged to refer to them throughout the course.

General Python Programming

Automate the Boring Stuff with Python by Al Sweigart This book is great for beginners, offering practical Python examples and exercises to help you automate common tasks with Python. 3. Numpy Library

The Numpy Documentation This is the official documentation for the Numpy library, covering everything from installation to advanced functions.

“Learning Numpy” by Ivan Idris A beginner-friendly guide that introduces the Numpy library and how to use it for scientific computing.

Matplotlib and Data Visualization

Matplotlib Documentation The official Matplotlib documentation offers detailed instructions on how to use the library for data visualization.

Pandas Library

Pandas Documentation This is the official documentation for the Pandas library, covering data manipulation and analysis using DataFrames.

“Effective Pandas” by Matt Harrison A concise and practical guide to mastering Pandas for data analysis tasks.

Additional Resources

“Real Python” (realpython.com) An excellent resource with tutorials and articles on Python programming, data analysis, and visualization techniques.

“Kaggle” (https://www.kaggle.com/learn/python) Offers free tutorials and hands-on practice on Python, data analysis, and data visualization through Kaggle’s datasets.

TOPIC 1: Numpy Library

The Numpy library is one of the core libraries for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a wide range of mathematical functions to operate on these arrays efficiently. Numpy is essential for many data science and machine learning applications.

Numpy Learning Materials:

1. Numpy Notebook: Explore Numpy's capabilities through a detailed notebook covering array creation, element-wise operations, indexing, slicing, broadcasting, and mathematical functions for efficient scientific computing.

2. Speed Test: Numpy Arrays vs Python Lists: Explore how Numpy significantly improves computational efficiency by comparing the speed of element-wise operations in Numpy arrays (ndarrays) versus native Python lists through a detailed speed test experiment.

3. Small Exercise for You: Practice calculating Body Mass Index (BMI) for 10 individuals using both regular Python lists and Numpy arrays. This exercise will help reinforce your understanding of Numpy’s array manipulation and mathematical capabilities.

Additional Resources: Numpy Documentation & Homepage: Learn more about Numpy from its official homepage, which includes full documentation and tutorials.

Numpy Tutorial on W3Schools: Dive into the W3Schools Numpy tutorial for a deeper exploration of its functions and applications.

TOPIC 2: Data Visualization

Data visualization is a crucial aspect of data analysis, enabling you to visually communicate insights, trends, and patterns effectively. Python provides several powerful libraries for creating informative and attractive visualizations. In this section, we will focus on two key libraries: Matplotlib and Seaborn.

Learning Materials:

Matplotlib Notebook: Explore the fundamentals of Matplotlib by learning how to create various types of plots such as line charts, bar charts, scatter plots, histograms, and pie charts
Matplotlib with Real-World Dataset: Apply your knowledge of Matplotlib to real-world datasets in this hands-on notebook.
Seaborn Notebook: Learn how to use Seaborn, a Python data visualization library based on Matplotlib that provides a high-level interface for creating attractive and informative statistical graphics.

Additional Resources:

Matplotlib Documentation & Tutorials: Learn more about Matplotlib from its official homepage, which includes extensive documentation, tutorials, and examples of various plot types.
Seaborn Documentation & Examples: Explore the Seaborn library through its official documentation, which provides tutorials, examples, and tips for customizing and enhancing your visualizations.
Matplotlib Tutorial on W3Schools: Deepen your understanding of Matplotlib’s capabilities with a detailed tutorial on W3Schools, covering various plot types, customization options, and more.
Seaborn Tutorial on W3Schools: Learn how to create advanced visualizations with Seaborn by exploring the W3Schools tutorial, which includes practical examples and step-by-step guidance.

TOPIC 3: Data Manipulation with Pandas

Pandas is a powerful and versatile Python library for data manipulation and analysis. It provides data structures like Series and DataFrame to manage and analyze data efficiently, making it an essential tool for any data processing task. In this section, we will explore how to use Pandas for data cleaning, transformation, analysis, and manipulation.

Learning Materials:

Pandas Fundamentals Notebook:

Learn the basics of Pandas, including how to create and manipulate Series and DataFrame objects, handle missing data, and perform basic data operations such as filtering, grouping, and sorting.

Pandas with Real-World Dataset:

Dive into practical applications of Pandas by working with real-world datasets [Download CSV Dataset]. This notebook will guide you through common data cleaning tasks, including handling missing values, transforming data, and preparing it for analysis.

Advanced Pandas Operations Notebook:

Explore more advanced Pandas features such as pivot tables, merging, joining, reshaping data, and working with time series data.

Additional Resources:

Pandas Documentation & Tutorials:

The official Pandas documentation, which provides a comprehensive guide to the library, including tutorials, examples, and best practices for data manipulation.

Pandas Tutorial on W3Schools:

A beginner-friendly guide to Pandas, offering practical examples and step-by-step instructions to help you master the library's core functionalities.

Pandas Cheat SheetAvaa tämä dokumentti ReadSpeaker docReaderillä :

A quick reference guide for common Pandas operations, ideal for refreshing your knowledge or finding specific commands quickly.

TOPIC 4: Analyzing Example Dataset

We will apply the data manipulation techniques learned in the previous topics to a real-world dataset. The dataset used here is a shop satisfaction survey that contains responses from participants along with relevant information about their demographics, purchasing behavior, and satisfaction levels.

You will learn how to do data cleaning, explore descriptive statistics, visualize the data, and examine the relationships between variables using cross-tabulation and hypothesis testing.

By the end of this session, you should be able to effectively apply data processing and analysis techniques to any dataset. You will use a similar dataset for your project.

Learning Materials:

Predictive Analytics with Python 2024

TOPIC 1: Supervised Learning (k-Nearest Neighbors Algorithm)

Supervised learning is a fundamental approach in predictive analytics, where a model learns from labeled data to make predictions. In this topic, we will focus on the k-Nearest Neighbors (kNN) algorithm, a simple yet powerful classification technique.

Learning Materials:

Lecture 01 Slides : Supervised Learning and the kNN Algorithm

Code Examples:

kNN Algorithm Using the sklearn Library: For hands-on implementation, use this notebook with sklearn to easily apply the kNN algorithm. (Recommended for Assignment 1)
kNN Algorithm from Scratch (For Practice Only): Understand the inner workings of the kNN algorithm with a Python implementation built from scratch. (Note: This is for practice and not required for Assignment 1).
Cross Validation: Cross-validation is essential to ensure your model generalizes well to new data. We’ll use k-fold cross-validation with the kNN algorithm to assess performance.
kNN with k-Fold Cross Validation (Notebook): Explore how to implement k-fold cross-validation using sklearn and kNN to improve model reliability.
Explanation of k-Fold Cross Validation (External Link): Review this linked article for an in-depth explanation of k-fold cross-validation.

Additional Resources: Here are some resources to deepen your understanding about the field Data Science:

What is Data Science: Understand the fundamentals of data science.
Data Science vs. Machine Learning: Discover the differences between data science and machine learning and how they complement each other.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
BasicDataAnalysis		BasicDataAnalysis
Data-Processing		Data-Processing
Predictive-Analysis		Predictive-Analysis
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python

BasicDataAnalysis is the thing I learnt in 1st year

Data-Processing is the thing I learnt in 2nd year

Recommended Reading List for Data Processing with Python

TOPIC 1: Numpy Library

Numpy Learning Materials:

TOPIC 2: Data Visualization

Learning Materials:

Additional Resources:

TOPIC 3: Data Manipulation with Pandas

Learning Materials:

Pandas Fundamentals Notebook:

Pandas with Real-World Dataset:

Advanced Pandas Operations Notebook:

Additional Resources:

Pandas Documentation & Tutorials:

Pandas Tutorial on W3Schools:

Pandas Cheat SheetAvaa tämä dokumentti ReadSpeaker docReaderillä :

TOPIC 4: Analyzing Example Dataset

Learning Materials:

Predictive Analytics with Python 2024

TOPIC 1: Supervised Learning (k-Nearest Neighbors Algorithm)

Learning Materials:

Code Examples:

Additional Resources: Here are some resources to deepen your understanding about the field Data Science:

About

Uh oh!

Releases

Packages

Languages

andylovecloud/Python

Folders and files

Latest commit

History

Repository files navigation

Python

BasicDataAnalysis is the thing I learnt in 1st year

Data-Processing is the thing I learnt in 2nd year

Recommended Reading List for Data Processing with Python

TOPIC 1: Numpy Library

Numpy Learning Materials:

TOPIC 2: Data Visualization

Learning Materials:

Additional Resources:

TOPIC 3: Data Manipulation with Pandas

Learning Materials:

Pandas Fundamentals Notebook:

Pandas with Real-World Dataset:

Advanced Pandas Operations Notebook:

Additional Resources:

Pandas Documentation & Tutorials:

Pandas Tutorial on W3Schools:

Pandas Cheat SheetAvaa tämä dokumentti ReadSpeaker docReaderillä :

TOPIC 4: Analyzing Example Dataset

Learning Materials:

Predictive Analytics with Python 2024

TOPIC 1: Supervised Learning (k-Nearest Neighbors Algorithm)

Learning Materials:

Code Examples:

Additional Resources: Here are some resources to deepen your understanding about the field Data Science:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages