Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Springer] Paper title and journal name #38

Closed
zhugeyicixin opened this issue May 2, 2019 · 3 comments
Closed

[Springer] Paper title and journal name #38

zhugeyicixin opened this issue May 2, 2019 · 3 comments

Comments

@zhugeyicixin
Copy link

  1. It is natural to think the paper title/jounal name is a string rather than a list. And we have discussed it in the PR comments.

  2. There are some weird pages have several paper titles, for example:
    10.1007/BF01161620
    10.1007/s10230-014-0302-8

  3. Parser needs to be fixed for some journals, for example:

10.1007/s10562-004-3745-x: parsed Journal is ['Catalysis Letters', 'J. Catal.', 'J. Am. Chem. Soc.', 'J. Phys. Chem.', 'Catal. Lett.', 'Angew. Chem. Int. Edn.', 'J. Ind. Rng. Chem.', 'J. Catal.']

10.1007/s11244-005-2883-8: parsed Journal is ['Topics in Catalysis', 'Stud. Surf. Sci. Catal.', 'Appl. Catal. A: General', 'Stud. Surf. Sci. Catal.', 'Top Catal.', 'J. Phys. Chem.', 'Top. Catal.', 'Top. Catal.', 'Stud. Surf. Sci. Catal.', 'Stud. Surf. Sci. Catal.', 'Brennstoff-Chem.', 'Angew. Chem.', 'Stud. Surf. Sci. Catal.', 'Stud. Surf. Sci. Catal.', 'Stud. Surf. Sci. Catal.', 'Catalysis Today', 'Fuel Process Technol.', 'Appl. Cat. A: General', 'CIT']

So I think maybe we should:

  1. Change the type of Journal and Title from list to str
  2. Maybe get rid of html files containing several titles if they are useless?
  3. Fix the parser for Journal if we want to keep this field. Since the Journal name is already known during scraping, we could also not parse Journal.

What do you think? @IAmGrootel @hhaoyan

@hhaoyan
Copy link
Contributor

hhaoyan commented May 13, 2019

Yes, title and journal name should be either str or None. Will be fixed in the coming versions. This is now being fixed.

@hhaoyan
Copy link
Contributor

hhaoyan commented May 15, 2019

Just for tracking:

  • - RSC
  • - ECS
  • - Nature
  • - Springer
  • - Wiley
  • - Elsevier
  • - ACS

@hhaoyan
Copy link
Contributor

hhaoyan commented Jun 10, 2019

solved

@hhaoyan hhaoyan closed this as completed Jun 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants