Let's start lookinf for classic citations at the EuropePMC database.

Once more, we will use naive definitions of citation classics. 

Classics are all cited more than 150 times. 

Big classics: more than 500 citations

Medium classics: at least 300 citations, less than 500

Small classics: at least 150 citations, less than 300


In [1]:
import json
from pandas.io.json import json_normalize


import requests
from bs4 import BeautifulSoup

pmc_query = "https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=%22single-cell%20RNA-seq%22&resultType=lite&cursorMark=*&pageSize=100&sort=CITED%20desc&format=json"
page = requests.get(pmc_query)

data = json.loads(page.text)
df = json_normalize(data['resultList']['result'])


single_cell_papers = df[['pmid', 'title', 'authorString', 'journalTitle', 'citedByCount', 'pubYear']]



In [11]:
def get_first_author(string):
    result = string.split(',')[0]
    return(result)
    
first_authors = single_cell_papers['authorString'].apply(get_first_author)
single_cell_papers['firstAuthor'] = first_authors
single_cell_papers = single_cell_papers[single_cell_papers['citedByCount']>=150]

single_cell_papers = single_cell_papers.head(67)

Cool, now we have our database of interest: all articles with more than 150 citations in EuropePMC about "single-cell RNA sequencing". 

The goal now is to turn this database into a markdown, user-friendly list. 

We will start by creating a base markdown file


In [12]:
import pandas as pd
last_week_classics = pd.read_csv('current_papers.csv')

single_cell_papers 
last_week_classics



from mdutils.mdutils import MdUtils
mdFile = MdUtils(file_name='README', title='Classic citations from the Europe PMC database')

mdFile.new_header(level=1, title='Overview')
mdFile.new_paragraph("Let's start lookinf for classic citations at the EuropePMC database.Once more, we will use naive definitions of citation classics.") 

mdFile.new_line()
mdFile.new_line("* Classics are all cited more than 150 times.")
mdFile.new_line("* Big classics: more than 500 citations")
mdFile.new_line("* Medium classics: at least 300 citations, less than 500")
mdFile.new_line("* Small classics: at least 150 citations, less than 300\n")


mdFile.new_line("Currently, we have " + str(single_cell_papers.shape[0]) + " classics that meet the criteria above:")
mdFile.new_line()


"\n# Overview\n\n\nLet's start lookinf for classic citations at the EuropePMC database.Once more, we will use naive definitions of citation classics.  \n  \n* Classics are all cited more than 150 times.  \n* Big classics: more than 500 citations  \n* Medium classics: at least 300 citations, less than 500  \n* Small classics: at least 150 citations, less than 300\n  \nCurrently, we have 5 classics that meet the criteria above:  \n"

In [29]:
def get_years(pd):
    years = list(set(pd['pubYear']))
    years.sort(reverse = True)
    return(years)
    
    
def get_new_entry(row):
    result = ("* " + row['firstAuthor'] + " et al, " + "[" + row['title'] + "]" + 
         "(" + "https://europepmc.org/article/MED/" + row['pmid'] + ")")
    return(result)
    
    
def add_section_to_md(pd, mdFile):
    pd = pd.sort_values('firstAuthor')
    years = get_years(pd)

    for year in years:
        mdFile.new_header(3, year)
        df = pd[pd["pubYear"] == year]
        for index,row in df.iterrows():
            
            new_entry  = get_new_entry(row)
            
            if row['title'] in last_week_classics['title']:
                
                
                mdFile.new_line(new_entry)
            
            else :
                
                mdFile.new_line(new_entry + '[NEW]')
            
            
            
        
        mdFile.new_line()
    
    return(mdFile)
    

In [30]:

mdFile.new_header(1, "Big classics")
big_classics = single_cell_papers[single_cell_papers['citedByCount'] >= 500]
mdFile = add_section_to_md(big_classics, mdFile)

mdFile.new_header(1, "Medium classics")
mid_classics = single_cell_papers[(single_cell_papers['citedByCount'] >= 300)
                                  & (single_cell_papers['citedByCount'] < 500)]
mdFile = add_section_to_md(mid_classics, mdFile)


mdFile.new_header(1, "Small classics")
smol_classics = single_cell_papers[(single_cell_papers['citedByCount'] >= 150)
                                  & (single_cell_papers['citedByCount'] < 300)]
mdFile = add_section_to_md(smol_classics, mdFile)

mdFile.new_header(1, "Contributions")

mdFile.new_line("Want to contribute a classic that I've missed? Great! Just add classic to a fork, make a pull request, and it is good to go.")




single_cell_papers.to_csv('current_papers.csv')


"\n# Overview\n\n\nLet's start lookinf for classic citations at the EuropePMC database.Once more, we will use naive definitions of citation classics.  \n*Classics are all cited more than 150 times.  \n*Big classics: more than 500 citations  \n*Medium classics: at least 300 citations, less than 500  \n*Small classics: at least 150 citations, less than 300\n\n# Big classics\n\n### 2015\n  \n* Klein AM et al, [Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.](https://europepmc.org/article/MED/26000487)  \n* Macosko EZ et al, [Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets.](https://europepmc.org/article/MED/26000488)  \n* Zeisel A et al, [Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.](https://europepmc.org/article/MED/25700174)  \n\n### 2014\n  \n* Jaitin DA et al, [Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell type

In [31]:
from datetime import date
today = date.today()
today_long_format = today.strftime("%B %d, %Y")
mdFile.new_line()
mdFile.new_header(1, "Last update:")

mdFile.new_line(today_long_format)


"\n# Overview\n\n\nLet's start lookinf for classic citations at the EuropePMC database.Once more, we will use naive definitions of citation classics.  \n*Classics are all cited more than 150 times.  \n*Big classics: more than 500 citations  \n*Medium classics: at least 300 citations, less than 500  \n*Small classics: at least 150 citations, less than 300\n\n# Big classics\n\n### 2015\n  \n* Klein AM et al, [Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.](https://europepmc.org/article/MED/26000487)  \n* Macosko EZ et al, [Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets.](https://europepmc.org/article/MED/26000488)  \n* Zeisel A et al, [Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.](https://europepmc.org/article/MED/25700174)  \n\n### 2014\n  \n* Jaitin DA et al, [Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell type

In [32]:
mdFile.create_md_file()




<mdutils.fileutils.fileutils.MarkDownFile at 0x7fa02f644320>

64