Repository for Natural Language Processing (NLP) project, for Ednalyn C. De Dios, Jason Dunn and Chad Hackney, in the Data Science program at Codeup, San Antonio, Texas.
Repo will contain all project-related materials, to include the final presentation, the primary project Jupyter Notebook, as well as any 'helper' Python files, acquire.py, preparation.py, explore.py and model.py.
Final project presentation: https://docs.google.com/presentation/d/1rmbv9_Fujs50alcOsIi6RGRaCrAp4Akbexeo6D1UrRQ/edit#slide=id.p
Project due date: 13 May 19 Project start: 9 May 19
Project involves: - web-scraping of a designated set of webpages/github repositories; - returning and looking specifically at the associated README.md file for github repo; - combining and analyzing the verbiage/words contained in the README files; - and then modeling the words to predict the language used to generate the contents of each repository.
Github repositories used:
- any github repos associated with the "Texas Tribune" primary github page, at: https://github.com/texastribune
From it's public website, https://www.texastribune.org : "The Texas Tribune is the only member-supported, digital-first, nonpartisan media organization that informs Texans — and engages with them — about public policy, politics, government and statewide issues."