Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph displaying topics vs. time #111

Open
dvmorozov opened this issue Dec 5, 2022 · 0 comments
Open

Graph displaying topics vs. time #111

dvmorozov opened this issue Dec 5, 2022 · 0 comments
Assignees
Labels
feature New feature

Comments

@dvmorozov
Copy link
Owner

dvmorozov commented Dec 5, 2022

Solution

  1. Implement Python-script to split corpus into a set of sub-spaces with granularity of one month. Use metadata to do that. ✔️
  2. Implement Python script to mine topics sequentially and fill mining data to JavaScript-file (which is used as graph data). ✔️
  3. Count and display not existing, skipped (by version) and copied files. Write not existing files into CSV-file together with article id. and version.
  4. Print estimated time of processing set of all month.
  5. Implement Python script saving article identifiers as set of JSON-files for parallel mining. 💡
  6. Implement Python script for mining topics for month given as script parameter for parallel mining. 💡
  7. Print estimated time of processing metadata. ❌ Impossible to implement with ijson because it doesn't get number of articles in advance.
  8. Implement Python script for parallel mining topics on partitioned corpus. 💡
  9. Mine topics for the last year. 💡
  10. Group articles by metadata topics (abbreviations). 💡
  11. Compute number of articles by topic from metadata vs. time. Use this for graph data. 💡
  12. Number of topics and topic items should be adjustable.
  13. Use stream graph.

Related

  1. Topic maining #109.
  2. Graph displaying rate of adding articles by topic #83.
  3. Graph displaying article topics #74.
@dvmorozov dvmorozov added the feature New feature label Dec 5, 2022
@dvmorozov dvmorozov self-assigned this Dec 5, 2022
dvmorozov added a commit that referenced this issue Feb 8, 2023
dvmorozov added a commit that referenced this issue Feb 9, 2023
dvmorozov added a commit that referenced this issue Feb 15, 2023
dvmorozov added a commit that referenced this issue Feb 18, 2023
dvmorozov added a commit that referenced this issue Feb 18, 2023
dvmorozov added a commit that referenced this issue Feb 19, 2023
dvmorozov added a commit that referenced this issue Feb 24, 2023
dvmorozov added a commit that referenced this issue Feb 27, 2023
dvmorozov added a commit that referenced this issue Feb 28, 2023
dvmorozov added a commit that referenced this issue Feb 28, 2023
dvmorozov added a commit that referenced this issue Feb 28, 2023
dvmorozov added a commit that referenced this issue Mar 1, 2023
dvmorozov added a commit that referenced this issue Mar 3, 2023
dvmorozov added a commit that referenced this issue Mar 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

No branches or pull requests

1 participant