Skip to content

Dev -> Main#6

Merged
rohanbansal12 merged 18 commits intomainfrom
dev
Dec 6, 2021
Merged

Dev -> Main#6
rohanbansal12 merged 18 commits intomainfrom
dev

Conversation

@rohanbansal12
Copy link
Contributor

First pass for entire scraping to ingestion pipeline for discourse being merged to main

rohanbansal12 and others added 18 commits December 2, 2021 00:39
Include sample data files and raw scraping script
Remove local file saves and update to save to s3. Remove extra imports and arguments
Add basic commenting for scraper class
Add basic design decisions for later revisiting
Add basic information about raw data structure to the readme
Raw data collection from the discourse API
Add ingest file, with cleaning of raw data into lists of dictionaries and proper keys. Testing with local saves
Move functions into separate file
Defines the database structure for the various tables in the SQL database for discourse
Add saving of cleaned data to s3 and then ingestion into the chainverse database
Add prior missing field to the categories table
Create truncate and ingest function
@rohanbansal12 rohanbansal12 merged commit 80c53af into main Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant