The purpose of this project is to develop an understanding of JSON file formats and how unstructured text data can be stored in a PostgreSQL database, and used in Python.
For JSON parsing code, please see:
https://github.com/Daniel-Elston/JSON_to_PGSQL/blob/master/Notebooks/B1_JSON_Exploration/json_exploration_3.ipynb
Textual data is often unstructured and can be extremely messy. Having the ability to appropriately store this form of data is essential for ML model building and generating insights.
The first stage of this project will raw unstructured data in .JSON format will be parsed using Python then stored in a PostgreSQL database. Once the data has been stored in an organised manner, PostgreSQL queries will used to export data ready for processing in Python.
https://www.reddit.com/r/all.json
- Python (JSON data handling)
- PostGreSQL
- Libraries: Pandas, NumPy
- Parsing and handling JSON data
- Database design and management with PostgreSQL
- Data processing and analysis using Python libraries (Pandas, NumPy)
Team Lead: Daniel Elston
Name | GitHub Handles |
---|---|
Daniel Elston | GitHub D. Elston |
Please feel free to contact me if you have any questions, require any further information or wish to contribute.
Email: delstonds@outlook.com