# Programa para separar cada referência citada em múltiplas linhas

### *Atenção:* Se tiver algum artigo sem referência, ou se tiver NaNs em (UT, PY ou CR) devem ser excluídos.

In [1]:
# bibliotecas
import pandas as pd

In [7]:
# dataframe
rdf = pd.read_csv('savedrecs_base.csv')
rdf.head(5)

Unnamed: 0,Publication Type,Authors,Book Authors,Book Editors,Book Group Authors,Author Full Names,Book Author Full Names,Group Authors,Article Title,Source Title,...,Web of Science Index,Research Areas,IDS Number,UT (Unique WOS ID),Pubmed Id,Open Access Designations,Highly Cited Status,Hot Paper Status,Date of Export,Unnamed: 69
0,J,"Price, R; Skopec, M; Mackenzie, S; Nijhoff, C;...",,,,"Price, Robyn; Skopec, Mark; Mackenzie, Simon; ...",,,A novel data solution to inform curriculum dec...,SCIENTOMETRICS,...,Science Citation Index Expanded (SCI-EXPANDED)...,Computer Science; Information Science & Librar...,YJ3GA,WOS:000744420900001,,hybrid,,,2022-02-13,
1,J,"Ma, J; Pan, YH; Su, CY",,,,"Ma, Jing; Pan, Yaohui; Su, Chih-Yi",,,Organization-oriented technology opportunities...,SCIENTOMETRICS,...,Science Citation Index Expanded (SCI-EXPANDED)...,Computer Science; Information Science & Librar...,YH8JZ,WOS:000743408700001,,,,,2022-02-13,
2,J,"Hammami, A; Semmar, N",,,,"Hammami, Asma; Semmar, Nabil",,,The simplex simulation as a tool to reveal pub...,SCIENTOMETRICS,...,Science Citation Index Expanded (SCI-EXPANDED)...,Computer Science; Information Science & Librar...,YK9II,WOS:000720703000001,,,,,2022-02-13,
3,J,"da Silva, JTA; Dunleavy, DJ; Moradzadeh, M; Ey...",,,,"Teixeira da Silva, Jaime A.; Dunleavy, Daniel ...",,,A credit-like rating system to determine the l...,SCIENTOMETRICS,...,Science Citation Index Expanded (SCI-EXPANDED)...,Computer Science; Information Science & Librar...,UW2MR,WOS:000686089100010,34421155.0,"Green Published, Bronze",,,2022-02-13,
4,J,"Faria, JR; Mixon, FG",,,,"Faria, Joao Ricardo; Mixon, Franklin G., Jr.",,,The Marginal Impact of a Publication on Citati...,SCIENTOMETRICS,...,Science Citation Index Expanded (SCI-EXPANDED)...,Computer Science; Information Science & Librar...,UB2BQ,WOS:000664849100001,,,,,2022-02-13,


In [8]:
# Colunas antes do rename
rdf.columns

Index(['Publication Type', 'Authors', 'Book Authors', 'Book Editors',
       'Book Group Authors', 'Author Full Names', 'Book Author Full Names',
       'Group Authors', 'Article Title', 'Source Title', 'Book Series Title',
       'Book Series Subtitle', 'Language', 'Document Type', 'Conference Title',
       'Conference Date', 'Conference Location', 'Conference Sponsor',
       'Conference Host', 'Author Keywords', 'Keywords Plus', 'Abstract',
       'Addresses', 'Affiliations', 'Reprint Addresses', 'Email Addresses',
       'Researcher Ids', 'ORCIDs', 'Funding Orgs', 'Funding Text',
       'Cited References', 'Cited Reference Count', 'Times Cited, WoS Core',
       'Times Cited, All Databases', '180 Day Usage Count',
       'Since 2013 Usage Count', 'Publisher', 'Publisher City',
       'Publisher Address', 'ISSN', 'eISSN', 'ISBN', 'Journal Abbreviation',
       'Journal ISO Abbreviation', 'Publication Date', 'Publication Year',
       'Volume', 'Issue', 'Part Number', 'Supplement', 

In [9]:
# Renomear colunas (se necessário)
# 'UT (Unique WOS ID)':'UT'
# 'Cited References':'CR'
# 'Publication Year':'PY'
rdf = rdf.rename({'UT (Unique WOS ID)':'UT',
                  'Cited References':'CR',
                  'Publication Year':'PY'}, axis=1)  # new method

In [10]:
# Colunas antes do rename
rdf.columns

Index(['Publication Type', 'Authors', 'Book Authors', 'Book Editors',
       'Book Group Authors', 'Author Full Names', 'Book Author Full Names',
       'Group Authors', 'Article Title', 'Source Title', 'Book Series Title',
       'Book Series Subtitle', 'Language', 'Document Type', 'Conference Title',
       'Conference Date', 'Conference Location', 'Conference Sponsor',
       'Conference Host', 'Author Keywords', 'Keywords Plus', 'Abstract',
       'Addresses', 'Affiliations', 'Reprint Addresses', 'Email Addresses',
       'Researcher Ids', 'ORCIDs', 'Funding Orgs', 'Funding Text', 'CR',
       'Cited Reference Count', 'Times Cited, WoS Core',
       'Times Cited, All Databases', '180 Day Usage Count',
       'Since 2013 Usage Count', 'Publisher', 'Publisher City',
       'Publisher Address', 'ISSN', 'eISSN', 'ISBN', 'Journal Abbreviation',
       'Journal ISO Abbreviation', 'Publication Date', 'PY', 'Volume', 'Issue',
       'Part Number', 'Supplement', 'Special Issue', 'Meeting Ab

In [15]:
# Remover colunas com NaNs com zero
print(len(rdf))
rdf = rdf[rdf['CR'].notna()]
rdf = rdf[rdf['PY'].notna()]
rdf = rdf[rdf['UT'].notna()]
print(len(rdf))

5807
5727


In [16]:
# 7Separa "CR" em outras variáveis (autores, ano, título)
# Passo 1 - Cria múltiplas linhas de "CR" (separa cada referência de uma linha em múltiplas linhas, identificando pelo UT)
ref_rows = pd.DataFrame(rdf.CR.str.split('; ').tolist(),index=rdf.UT).stack()

# Passo 2 - Reseta "EID" como index
ref_rows = ref_rows.reset_index([0,'UT'])

# Passo 3 - Adiciona os nomes nas colunas
ref_rows.columns = ['UT', 'CR']
#-------------------------------------------------
# Concate Year no dataframe para salvar arquivo com as colunas ID, Ref, Year
ref_rows1 = rdf[['UT','PY']]
ref_rows = ref_rows.merge(ref_rows1, on='UT', how='left')
ref_rows.head()
#-------------------------------------------------

Unnamed: 0,UT,CR,PY
0,WOS:000720703000001,"Aksnes DW, 2003, RES EVALUAT, V12, P159, DOI 1...",2022.0
1,WOS:000720703000001,"Antonakis J, 2008, J AM SOC INF SCI TEC, V59, ...",2022.0
2,WOS:000720703000001,"Bartneck C, 2011, SCIENTOMETRICS, V87, P85, DO...",2022.0
3,WOS:000720703000001,"Batista PD, 2006, SCIENTOMETRICS, V68, P179, D...",2022.0
4,WOS:000720703000001,"Benway BM, 2009, UROLOGY, V74, P30, DOI 10.101...",2022.0


In [17]:
# Salvar CSV com referências separadas (será utilizada para contar referências)
ref_rows.to_csv("CR-List.csv", index=False)

In [24]:
# salvar csv para baixar artigos no sci-hub
du = rdf[['UT','PY','DOI']]
du.to_csv("DOI-List.csv", index=False)