Skip to content
Natural Language Processing project, web-scraping, at Codeup, San Antonio, for De Dios, Dunn and Hackney
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
README.md
chad-project.ipynb
dd-project.ipynb
github_readme.json
jason-project.ipynb
master-project.ipynb

README.md

README.md

Repository for Natural Language Processing (NLP) project, for Ednalyn C. De Dios, Jason Dunn and Chad Hackney, in the Data Science program at Codeup, San Antonio, Texas.

Repo will contain all project-related materials, to include the final presentation, the primary project Jupyter Notebook, as well as any 'helper' Python files, acquire.py, preparation.py, explore.py and model.py.

Final project presentation: https://docs.google.com/presentation/d/1rmbv9_Fujs50alcOsIi6RGRaCrAp4Akbexeo6D1UrRQ/edit#slide=id.p

Project due date: 13 May 19 Project start: 9 May 19

Project involves: - web-scraping of a designated set of webpages/github repositories; - returning and looking specifically at the associated README.md file for github repo; - combining and analyzing the verbiage/words contained in the README files; - and then modeling the words to predict the language used to generate the contents of each repository.

Github repositories used:

From it's public website, https://www.texastribune.org : "The Texas Tribune is the only member-supported, digital-first, nonpartisan media organization that informs Texans — and engages with them — about public policy, politics, government and statewide issues."

You can’t perform that action at this time.