-
-
Notifications
You must be signed in to change notification settings - Fork 71
Add Wikipedia processing and report #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Wikipedia processing and report #235
Conversation
|
@TimidRobot I am considering classifying the languages by regions. Would that be quite meaningful? |
Rather, the data file doesn't have any region information. The processing phase shouldn't fetch any additional information. If there is region data, the fetch script needs to be updated. Though, given the various diasporas, I'm skeptical that region information is helpful. |
Ohh okay. |
|
I also think think you should update and reorder plots:
|
|
We use sentence case for titles and headings because we generally follow the Google developer documentation style guide.
Sentence case improves readability and allows consistent capitalization (otherwise knowing which words to capitalize can become quite hard to remember). See also: Documentation Guidelines — Creative Commons Open Source |
TimidRobot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, thank you!
Fixes
Description
Added python scripts for processing and reporting wikipedia data.
Covered analysis around top 10 highest language usage, classification of represented and underrepresented languages, average count of article per language and percentage of all Wikipedia articles that belong to the top 10 languages.
Checklist
Update index.md).mainormaster).visible errors.
Developer Certificate of Origin
For the purposes of this DCO, "license" is equivalent to "license or public domain dedication," and "open source license" is equivalent to "open content license or public domain dedication."
Developer Certificate of Origin