Data Science Experiment 1
As a first venture into the world of Data Science with Python, I found a recipe for Harvesting & Geolocating Twitter Data in a great book by Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort and Abhijit Dasgupta, “Practical Data Science Cookbook: 89 hands-on recipes to helpyou complete real-world data science projects in R and Python.”
Project Goals:
-
Create & Build Twitter API
-
Use Python to determine Twitter followers, friends, and pull Twitter user profiles
-
Store JSON Twitter data to disk and to MongoDB using PyMongo
-
Explore the geographic information available in profiles
-
Visualize geographic information using Python
Practical Data Science Cookbook: 89 hands-on recipes to help you complete real-world data science projects in R and Python by Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort and Abhijit Dasgupta. This is an excellent resource for beginning in the world of Data Science with R and Python and I am very thankful to these guys for putting together a fantastic book!
-
SciPy - Python based ecosystem of open source software for math, science, and engineering and includes a number of useful libraries for machine learning, scientific computing, and modeling.
-
NumPy - The foundational Python package providing numerical computation in Python. NumPy is the reason that Python cna do efficient, large-scale numerical computation that other interpreted or scripting languages cannot do.
-
Pandas - Provides a robust data frame object and many additional tools to make traditional data and statistical analysis fast and easy.
-
[Twitter API](<https://dev.twitter.com/rest/public) - Provides programmatic access to read and write Twitter data.
-
Twython - Actively maintained, pure Python wrapper for the Twitter API. Supports both normal and streaming Twitter APIs.
-
mongoDB - Will make your life working with a database super easy, flexible, and scalable.
-
PyMongo - Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python.
-
Folium - Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via Folium.