Skip to content
RCS Data Analysis using Python course for July-August 2019
Jupyter Notebook Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
FoliumGeoLibrary
Irises_ML_Intro
JSON
Keras_TensorFlow_Image_Recognition
NoSQL
NumPy
OpenCV
Pandas-Cookbook
Pandas
PlotlyVisualizationLib
PowerBI
Projects
PySpark
RegEx
SQL
Titanic
WebScraping
cheat-sheets
data
handson-ml
img
scikit-learn
.gitignore
Data_Analysis_Python_Introduction.pdf
Git_Workflow.md
Jupyter with Python.md
LICENSE
Processing User Input.ipynb
Python Classes.ipynb
Python Cleaning Up Text Files.ipynb
Python Comparison operators.md
Python Conditional Execution Branching.md
Python Data Structures Exercises.ipynb
Python Dictionaries.ipynb
Python Errors.ipynb
Python Exercises.md
Python File IO.ipynb
Python File Operations 2 Binary Files and Pickle in class 21.05.2019.ipynb
Python Flow Control.ipynb
Python Flow Control.md
Python Functions.ipynb
Python Functions.md
Python Introduction.ipynb
Python Learning Resources.ipynb
Python Learning Resources.md
Python List Comprehension.ipynb
Python Lists.ipynb
Python Modules and Imports.ipynb
Python Reading Writing Files.md
Python Sets.ipynb
Python Strings.ipynb
Python Tuples.ipynb
Python Variables and Data Types.ipynb
Python_Dictionaries.md
Python_List_Exercise_1.ipynb
Python_Lists.md
README.md
Reading ipynb.ipynb
requirements.txt

README.md

RCS_Data_Analysis_Python_2019_July

RCS Data Analysis using Python course for July-August 2019

Binder(cloud hosted Jupyter notebooks) Beta

Binder

Course Plan

Goal

Build a complete data analysis pipeline using Python ecosystem

  • Define the problem
  • Gather the raw data
  • Process (clean) the data
  • Explore
  • Analysis (apply models, make predictions)
  • Reports and Visual Results in a form understandable to stakeholders

Setup (2h)

  • Git and Github
  • Text Editors
  • Anaconda
  • cloud based tools (Google Colab, myBinder, etc)

General Python Introduction (10h)

  • basic data types
  • working with compound data(slicing)
  • structure (functions, classes, )
  • program flow (conditionals)
  • input/output
  • importing external libraries
  • introduction to NumPy, Pandas

Gathering Data with Python (2-4h)

  • web scraping with Selenium, Beautiful Soup
  • using APIs

Databases

SQL (2-4h)

  • reintroduction to SQL databases
  • ACID compliance

NoSQL (4-6h)

  • NodeJS
  • MongoDB
  • other NoSQL databases

Big Data(2-4h)

  • The 4 Vs - (volume, variety, velocity, veracity)
  • Apache Hadoop Ecosystem
  • Apache Lucene -> Elasticsearch

Cleaning Data (2-6h)

  • advancing your NumPy, Pandas skills

Analysis and Data Exploration(4-10h)

  • Pandas, matplotlib etc

Social Network Analytics

  • Graph Analysis (Network Analysis)

Machine Learning with Python (6-10h)

Note: ML section may be expanded if good progress is made in other sections :)

Principles of ML -

  • test/train data
  • supervised/unsupervised learning
  • classifiers
  • regressors

ML Tools

  • scikit-learn
  • TensorFlow with Keras
  • PyTorch

Visualization (4-6h)

  • PowerBI OR Tableau
  • Python visualization libraries (mathplotlib, Seaborn)
  • Graphviz
  • Dash/Plotly

Useful Python Libraries (2-6h)

  • PDF processing
  • email
  • PyQT
  • nltk

Building a complete data analysis pipeline (4-6h)

  • Course Project

Tools of the trade:

Anaconda Distribution(Python, R and more) https://www.anaconda.com/download/

Survey

Students, please fill out a brief survey before starting the course: https://docs.google.com/forms/d/e/1FAIpQLSc_ODlnXNKz4uYYokCb2ED-ZAYQRVEurTTuNawAfMPbEd61Rg/viewform?usp=pp_url

You can’t perform that action at this time.