In this course, we'll get a comprehensive overview of data science, a growing field that combines coding, math, statistics, and often business acumen. The workshop will start by showing you the entire process for data science projects and the different roles and skills that are needed. Then the basics of obtaining data through a variety of sources are introduced to you, including web APIs and page scraping.
Through training workshop program, python programming language is introduced and used as the primary tool in handling, analyzing, visualizing and presenting data. Participants will get a full understanding of how to program with python and how to use it in conjunction with scientific computing modules and libraries to analyze data.
We'll also take a look at powerful techniques for analyzing data. We'll be covering techniques for planning, performing, and presenting your projects to help you get started in data science and making the most of the data that's all around you. And so let's get started with Introduction to Data Science.
Topics & Workshop sessions materials
1. Introduction to Data Science
Overview of Data Science
During this session you'll learn about Data Science, the goals and objectives of the Data Science Specialization and each of its components. You'll also get to know why data scientists are now in such demand, and the skills required to succeed in different jobs. Different business cases are presented during session implementation as case studies to view and discuss the data science economical potential.
- What is Data Science and its History
- Data science demand
- Business cases and economical potential
- the impact of data size
- Big data vs. Data Science
- The role of a data scientist and their impact on this field
- Data science life cycel
- Data types: structured, ustructured and everything in between
John Canny, (fall, 2014), Introduction to Data Science course, berkeley university, Retrieved from data science course at berkeley
Charles Tappert, Data Analytics Lifecycle, Seidenberg School of CSIS, Pace University, Retrieved from case study
Booz, Allen, Hamilton, (2015), The Field Guide to Data Science, 2nd Edition, Retrieved from The Field Guide to Data Science book
2. Getting started with Python
Data science has been described as intersection of programming, statistics and topical expertise. Python is an excellent programming tool for data analysis because it's friendly, pragmatic, mature and because it's complemented by excellent third party packages that were designed to deal with large amounts of data. through this session I will show you how to set up your analysis environment and provides a refresher on the basics of working with data containers in Python. Then he jumps into the big stuff: the power of arrays, indexing, and DataFrames in NumPy and Pandas so We will start this session by reviewing Python data containers which are useful on their own and which set the model for the more powerful data objects of NumPy and Pandas.
- introduction to Python
- Using Python lists
- Learning Numpy
- Assignment 1: anagram challenge overview
Jake VanderPlas, (2016), Python's Data Science Stack, eScience Institute, University of Washington, Retrieved from Python_Data_Science_Stack
Marci, Complete-Python-Bootcamp, ,Columbia University, Retrieved from github
3. Exploratory Data Analysis (EDA 8 hours)
This session presents the assumptions, principles, and techniques necessary to gain insight into data via EDA--exploratory data analysis.
we will be covering the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data.
Upon completion of the subject, students will be able to:
- Use a variety of basic techniques in understanding and interpreting data;
- Apply elementary statistical methods in analyzing business scenarios and problems;
- Think critically and creatively about the uses and limitations of statistical methods in business;
- Use statistical package and interpret the output, appreciate the applications of information technology for statistical analysis in business.
Session Outline & References
EDA Techniques and Data Visualization
5. Data Engineering
During this session we hightlight the importance and role of data Engineering (Data Infrastructure or Data Architecture) process in data analysis. The data engineer gathers and collects the data, stores it, does batch processing or real-time processing on it, and serves it via an API to a data scientist who can easily query it.
One important part of this session is data preprocessing where The quality of the data and the amount of useful information that it contains are key factors that determine how well a machine learning algorithm can learn. Therefore, it is absolutely critical that we make sure to examine and preprocess a dataset before we feed it to a learning algorithm. In this session, we will discuss the essential data preprocessing techniques that will help us to build good machine learning models.
Objectives: Data preprocessing
During this session participants will be able to:
- understand importance role of data Engineering
- implement and apply several data aquistion and preporcessing techniques using python
- understand Data Preprocessing stages and its role in developing data process model
- apply learned techniques to develop a data science model
The topics that we will cover in this session are as follows:
- Removing and imputing missing values from the dataset
- Getting categorical data into shape for machine learning algorithms
- Selecting relevant features for the model construction
- Introduction to Data Engineering
- data collection exercises
- data preprocessing
6. Machine Learning
Introduction to Machine Learning
In this session we will be focusing on very practical hands on applications of machine learning using the SciKit Learn module for Python. Install it using either pip install scikit-learn or conda install scikit-learn depending on your installation. You can treat each session almost like a micro-project. In each lecture we'll get a brief overview of the mathematics behind the model we'll work with and then dive into the code.