[Springer] Paper title and journal name #38

zhugeyicixin · 2019-05-02T01:45:10Z

It is natural to think the paper title/jounal name is a string rather than a list. And we have discussed it in the PR comments.
There are some weird pages have several paper titles, for example:
10.1007/BF01161620
10.1007/s10230-014-0302-8
Parser needs to be fixed for some journals, for example:

10.1007/s10562-004-3745-x: parsed Journal is ['Catalysis Letters', 'J. Catal.', 'J. Am. Chem. Soc.', 'J. Phys. Chem.', 'Catal. Lett.', 'Angew. Chem. Int. Edn.', 'J. Ind. Rng. Chem.', 'J. Catal.']

10.1007/s11244-005-2883-8: parsed Journal is ['Topics in Catalysis', 'Stud. Surf. Sci. Catal.', 'Appl. Catal. A: General', 'Stud. Surf. Sci. Catal.', 'Top Catal.', 'J. Phys. Chem.', 'Top. Catal.', 'Top. Catal.', 'Stud. Surf. Sci. Catal.', 'Stud. Surf. Sci. Catal.', 'Brennstoff-Chem.', 'Angew. Chem.', 'Stud. Surf. Sci. Catal.', 'Stud. Surf. Sci. Catal.', 'Stud. Surf. Sci. Catal.', 'Catalysis Today', 'Fuel Process Technol.', 'Appl. Cat. A: General', 'CIT']

So I think maybe we should:

Change the type of Journal and Title from list to str
Maybe get rid of html files containing several titles if they are useless?
Fix the parser for Journal if we want to keep this field. Since the Journal name is already known during scraping, we could also not parse Journal.

What do you think? @IAmGrootel @hhaoyan

The text was updated successfully, but these errors were encountered:

hhaoyan · 2019-05-13T19:27:16Z

Yes, title and journal name should be either str or None. Will be fixed in the coming versions. This is now being fixed.

hhaoyan · 2019-05-15T19:44:40Z

hhaoyan · 2019-06-10T18:14:51Z

solved

hhaoyan closed this as completed Jun 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Springer] Paper title and journal name #38

[Springer] Paper title and journal name #38

zhugeyicixin commented May 2, 2019

hhaoyan commented May 13, 2019

hhaoyan commented May 15, 2019 •

edited

Loading

hhaoyan commented Jun 10, 2019

[Springer] Paper title and journal name #38

[Springer] Paper title and journal name #38

Comments

zhugeyicixin commented May 2, 2019

hhaoyan commented May 13, 2019

hhaoyan commented May 15, 2019 • edited Loading

hhaoyan commented Jun 10, 2019

hhaoyan commented May 15, 2019 •

edited

Loading