-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor dataset to use pandas and cleaner setup. #36
Conversation
{ | ||
"name": "Core Dataset", | ||
"image": "mcr.microsoft.com/devcontainers/python:3.11", | ||
"features": { | ||
"ghcr.io/stuartleeks/dev-container-features/shell-history:0": {} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this, you can run the pipeline in your browser using GitHub Codespaces.
A simple way to make future contributors life easier as getting a development environment is one click away.
branches: ["master"] | ||
pull_request: | ||
branches: ["master"] | ||
workflow_dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it possible to trigger the workflow from the UI.
name: s-and-p-500-companies | ||
title: S&P 500 Companies with Financial Information | ||
version: "2.0" | ||
licenses: | ||
- name: ODC-PDDL-1.0 | ||
path: http://opendatacommons.org/licenses/pddl/ | ||
title: Open Data Commons Public Domain Dedication and License v1.0 | ||
resources: | ||
- name: constituents | ||
path: data/constituents.csv | ||
format: csv | ||
mediatype: text/csv | ||
schema: | ||
fields: | ||
- name: Symbol | ||
type: string | ||
- name: Security | ||
type: string | ||
- name: GICS Sector | ||
type: string | ||
- name: GICS Sub-Industry | ||
type: string | ||
- name: Headquarters Location | ||
type: string | ||
- name: Date added | ||
type: string | ||
- name: CIK | ||
type: string | ||
- name: Founded | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to YAML as it makes it more readable and I is what modern Frictionless data packages are using around GitHub.
@@ -1 +1,3 @@ | |||
beautifulsoup4 | |||
pandas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it! Though pandas is so heavy duty (you have to install numpy right ...). I wonder if we can get away with something more lightweight.
Let's live with it for now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. In this case I think is heavyweight for the computers but easy for humans to understand.
Amazing job. Merging. 👏 |
Heya @rufuspollock!
Since this seems to be the most starred package and also broken, I thought I would update/improve it after having spent some time with Frictionless and other data package managers.
It changes lots of things so please push back on anything that doesn't make sense.