Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

553 add routeviews data #639

Closed
wants to merge 28 commits into from
Closed

553 add routeviews data #639

wants to merge 28 commits into from

Conversation

trdavidt
Copy link
Contributor

Closes #553. Adding routeviews data works with updated Makefile and uses three scripts. scripts/externallinks_placeholder.py checks for and merges duplicate papers in two input yaml's.

trdavidt and others added 25 commits April 6, 2023 21:07
@bhuffaker
Copy link
Member

I get the following error messages when I try to make it:

python3 scripts/externallinks_placeholder.py data/data-papers.yaml data/data-papers-routeviews.yaml
    loading data/data-papers.yaml
    loading data/data-papers-routeviews.yaml
    found 69 duplicates
unparseable "" in "Citadels in cyberspace"
unparseable "" in "CAIDA Macroscopic IP Topology Data Kit (ITDK) #0204 provided to the Network Modeling and Simulation (NMS) community under DARPA grant N66001-01-1-8909"
unparseable "" in "The Internet Under Crisis Conditions: Learning from September 11"
unparseable "" in "ISMA Winter 2000 Workshop - Final Report"
make[2]: *** No rule to make target `routerviews', needed by `run'.  Stop.
make[1]: *** [fast] Error 2
make: *** [readable] Error 2

@trdavidt
Copy link
Contributor Author

I believe this is a typo in the Makefile. The routerviews target is no longer needed and has been removed. Should I also fix the "unparseable" errors above? They are a result of missing authors/information for these papers on routeviews.org where I scraped from to generate the routeviews yaml.

@bhuffaker
Copy link
Member

bhuffaker commented Jun 22, 2023 via email

@trdavidt
Copy link
Contributor Author

trdavidt commented Jun 27, 2023

It looks like there's no way to fix the remaining unparseable errors without editing the data-papers-routeviews.yaml that gets generated by scripts/external_routeviews_parse.py.

For example, the first unparseable paper has authors Rawat; Madhur; Chakravarty; Sambuddho on the routeviews website. This paper looks like it has 4 authors, and there is no way to systematically recognize that it actually has two authors without searching online and manually changing the yaml. Similarly, another has author Unknown. However, the routeviews yaml should not be committed (#553). What should I do?

@bhuffaker
Copy link
Member

Make a list of papers that are broken. We will email routeviews and have them fix it on their end.

@trdavidt
Copy link
Contributor Author

trdavidt commented Jul 3, 2023

python3 scripts/externallinks_placeholder.py data/data-papers.yaml data/data-papers-routeviews.yaml
    loading data/data-papers.yaml
    loading data/data-papers-routeviews.yaml
    found 69 duplicates
unparseable "" in "Citadels in cyberspace"
unparseable "" in "CAIDA Macroscopic IP Topology Data Kit (ITDK) #0204 provided to the Network Modeling and Simulation (NMS) community under DARPA grant N66001-01-1-8909"
unparseable "" in "The Internet Under Crisis Conditions: Learning from September 11"
unparseable "" in "ISMA Winter 2000 Workshop - Final Report"

Of these unparseable papers, these two should be corrected if possible:
(1) "Citadels in cyberspace": badly formatted authors (see prev comment)
(2) "The Internet Under Crisis Conditions: Learning from September 11": author is "Unknown"

Both of the remaining unparseable papers has author "CAIDA". I fixed the issues with "CAIDA" author by fixing my script that generates the routeviews yaml. The external placeholder script is able to handle single-name authors just fine.

Happy Fourth of July!

@bhuffaker
Copy link
Member

The caida paper should match against papers generated from pubdb. You will need to change the Makefile, so that it generates the papers from pudb before your code is called. It will then need to check the files generated in sources/papers and not ignore duplicates.

@trdavidt
Copy link
Contributor Author

trdavidt commented Jul 3, 2023

What do you mean by not ignore duplicates? What should I do if there is a duplicate paper? Would this be similar to merging duplicates like we discussed before (take union of keys)?

@bhuffaker
Copy link
Member

Actually, just skip those papers for now.

@trdavidt
Copy link
Contributor Author

trdavidt commented Jul 5, 2023

Ok, the "CAIDA" author papers should be skipped now. There should only be two unparseable papers.

(Edit: it is not showing up here, but I did push to the 553 branch )

@bhuffaker
Copy link
Member

bhuffaker commented Jul 5, 2023 via email

@trdavidt
Copy link
Contributor Author

trdavidt commented Jul 6, 2023

They are:

  • CAIDA Macroscopic IP Topology Data Kit (ITDK)...
  • ISMA Winter 2000 Workshop - Final Report

@bhuffaker
Copy link
Member

do you mean you don't know what to map them too?

  • CAIDA Macroscopic IP Topology Data Kit (ITDK)...
  • ISMA Winter 2000 Workshop - Final Report

@trdavidt
Copy link
Contributor Author

trdavidt commented Jul 8, 2023

I think I am misunderstanding your previous comments. This is how I interpreted our discussion:

  • Initially, my script for generating routeviews yaml did not produce the correct author for the two papers mentioned in my previous comment which results in "unparseable..." for these papers
  • I fixed this bug in an earlier commit (commit A)
  • In a previous comment, you mentioned that we should skip these papers for now
  • I made another change (commit B) to skip over these papers, they should no longer be in included in the generated yaml

With commit A: The two papers are included in the routeviews yaml. Then, they will be parsed correctly by the placeholder script and placeholder objects will be created for them.

With commit B: These papers are effectively ignored.

Is there something else missing that should done for this issue?

@bhuffaker
Copy link
Member

Please remove debugging error messages:

Matching up Papers with media/presentations with the same name
tag:slides
tag:slides
tag:slides
tag:slides
tag:slides

Resolve error messages:

python3 scripts/externallinks_placeholder.py data/data-papers.yaml data/data-papers-routeviews.yaml
    loading data/data-papers.yaml
    loading data/data-papers-routeviews.yaml
    found 69 duplicates
unparseable "" in "Citadels in cyberspace"
unparseable "" in "The Internet Under Crisis Conditions: Learning from September 11"

@bhuffaker bhuffaker closed this Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add routeviews data
2 participants