<h1>Data parsing<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Import-dependencies" data-toc-modified-id="Import-dependencies-1">Import dependencies</a></span></li><li><span><a href="#Load-data" data-toc-modified-id="Load-data-2">Load data</a></span></li><li><span><a href="#Data-exploration" data-toc-modified-id="Data-exploration-3">Data exploration</a></span><ul class="toc-item"><li><span><a href="#Get-a-list-of-standalone-paragraphs-from-each-article's-body" data-toc-modified-id="Get-a-list-of-standalone-paragraphs-from-each-article's-body-3.1">Get a list of standalone paragraphs from each article's body</a></span></li><li><span><a href="#Get-a-list-of-sections-(in-each-section-are-one-or-more-paragraphs)--from-each-article's-body" data-toc-modified-id="Get-a-list-of-sections-(in-each-section-are-one-or-more-paragraphs)--from-each-article's-body-3.2">Get a list of sections (in each section are one or more paragraphs)  from each article's body</a></span></li><li><span><a href="#Get-article's-body" data-toc-modified-id="Get-article's-body-3.3">Get article's body</a></span></li><li><span><a href="#Get-article's-abstract" data-toc-modified-id="Get-article's-abstract-3.4">Get article's abstract</a></span></li><li><span><a href="#Get-article's-categories" data-toc-modified-id="Get-article's-categories-3.5">Get article's categories</a></span></li><li><span><a href="#Get-article's-title" data-toc-modified-id="Get-article's-title-3.6">Get article's title</a></span></li><li><span><a href="#Get-number-of-pages" data-toc-modified-id="Get-number-of-pages-3.7">Get number of pages</a></span></li><li><span><a href="#Get-authors-list" data-toc-modified-id="Get-authors-list-3.8">Get authors list</a></span></li><li><span><a href="#Get-author's-affiliation" data-toc-modified-id="Get-author's-affiliation-3.9">Get author's affiliation</a></span></li><li><span><a href="#Get-number-of-figures" data-toc-modified-id="Get-number-of-figures-3.10">Get number of figures</a></span></li><li><span><a href="#Get-final-dataframe" data-toc-modified-id="Get-final-dataframe-3.11">Get final dataframe</a></span></li></ul></li></ul></div>

## Import dependencies

In [1]:
import os, shutil
import warnings
import random
warnings.simplefilter("ignore")

from shutil import copyfile
from distutils.dir_util import copy_tree

## Load data

In [2]:
input_dir = "../datasets/2016_testing_df/"

df = sqlContext.read.format('com.databricks.spark.xml').option("rowTag", "record")\
    .load(input_dir)

In [3]:
df.rdd.count()

122

## Data exploration

### Get a list of standalone paragraphs from each article's body

The paragraphs can be found: in **body/p** or simply in **p** tag from the root. But the results are the same.

In [4]:
def parse_paragraphs_list(list_of_paragraphs):
    list_of_paragraphs_to_ret = []
    if list_of_paragraphs != None:
        for paragraph in list_of_paragraphs:
            if paragraph != None:
                if type(paragraph) == str:
                    list_of_paragraphs_to_ret.append(paragraph)
                else:
                    if paragraph._VALUE != None:
                        list_of_paragraphs_to_ret.append(paragraph._VALUE)
    return list_of_paragraphs_to_ret

def get_paragraphs_list_from_body(input_dir):
    df =   spark.read \
                .format('com.databricks.spark.xml') \
                .options(rowTag='record')\
                .options(rowTag='body')\
                .load(input_dir)
    return  df.select("p")\
              .rdd\
              .map(lambda row: parse_paragraphs_list(row['p']))\
              .zipWithIndex()\
              .map(lambda record: (record[1], record[0]))\


get_paragraphs_list_from_body(input_dir).take(10)

[(0,
  ['We hypothesized that MYC-dependent metabolic dysregulation is essential for MO-TNBC. To test this hypothesis, we investigated the metabolism of a conditional doxycycline-inducible transgenic model of MO-TNBC (MTB-TOM)']),
 (1,
  ['A major challenge is maintaining membrane proteins in a lipid-like environment while keeping them stable and monodisperse in solution, so that they become accessible for biochemical, biophysical and structural studies. Methods to reconstitute membrane proteins into lipid nanoparticles provide a potential solution for this challenge. Current nanoparticle technologies that address this problem involve liposomes and high-density lipoprotein (rHDL) particles']),
 (2,
  ['PGC-1α in skeletal muscle induces broad genetic programs, including mitochondrial biogenesis and FA β-oxidation.']),
 (3,
  ['Morphogenetic patterns observed in experimental conditions are, as a rule, not linear and exploring and understanding their scaling properties is one of the bigge

### Get a list of sections (in each section are one or more paragraphs)  from each article's body

The sections are the same as in **metadata/article/body**

In [5]:
def parse_sections_list(list_of_sections):
    if list_of_sections != None:
        list_of_sections = [parse_paragraphs_list(section.p) for section in list_of_sections if section.p != None]
    else:
        list_of_sections = []
    return list_of_sections

def get_sections_list(input_dir):
    df =   spark.read \
                .format('com.databricks.spark.xml') \
                .options(rowTag='record')\
                .options(rowTag='body')\
                .load(input_dir)
    return  df.select("sec")\
              .rdd\
              .map(lambda row: parse_sections_list(row['sec']), )\
              .zipWithIndex()\
              .map(lambda record: (record[1], record[0]))\

get_sections_list(input_dir).take(10)

[(0, [['For U-']]),
 (1,
  [['Membrane proteins are encoded by approx. 30% of all open reading frames']]),
 (2,
  [['C2C12 cells were grown in 10 cm dishes until 90% confluency and differentiated into myotubes with 5 µg/mL insulin plus 5 µg/mL transferrin (Sigma) in DMEM for 2 days, followed by 2% horse serum in DMEM (C2C12 differentiation media) for additional 2 days. The cells were then infected with an adenovirus expressing GFP, PGC-1α or PGC-1β. Two days after infection, the cells were washed twice with PBS to remove the adenovirus and incubated with 12 mL DMEM (with or without 2% horse serum) for 2 days. 2DG (5 mM), AKT VIII (5 µM) or CHC (5 mM) was added if necessary. CMs were then collected, centrifuged at 13,000 g for 10 min at 4 °C, aliquoted and stored at –80 °C for future use. To generate PGC-1α-expressing C2C12 cell lines, cells were infected with retrovirus expressing PGC-1α or an empty-vector control. Two days after infection, the infected cells were selected with 2.5 µg/

### Get article's body

The body of the article is going to be build in the following manner:
-  we will concatenate each standalone paragraph (1)
-  we will concatenate each paragraph from a section
-  we will concatenate each section (3)
-  finally, we will concatenate the the results from (1) and (3)

In [6]:
def build_body(row_sec, row_p):
    concat_parsed_sections = ' '.join([' '.join(section) for section in parse_sections_list(row_sec)])
    concat_standalone_paragraphs = ' '.join(parse_paragraphs_list(row_p))
    print(concat_parsed_sections, concat_standalone_paragraphs)
    return concat_standalone_paragraphs + " " + concat_parsed_sections

In [7]:
def get_bodys_list(input_dir):
    df =   spark.read \
                .format('com.databricks.spark.xml') \
                .options(rowTag='record')\
                .options(rowTag='body')\
                .load(input_dir)
    return  df.select("sec", "p")\
              .rdd\
              .map(lambda row: build_body(row['sec'], row["p"]))\
              .zipWithIndex()\
              .map(lambda record: (record[1], record[0]))

get_bodys_list(input_dir).take(5)

[(0,
  'We hypothesized that MYC-dependent metabolic dysregulation is essential for MO-TNBC. To test this hypothesis, we investigated the metabolism of a conditional doxycycline-inducible transgenic model of MO-TNBC (MTB-TOM) '),
 (1,
  'A major challenge is maintaining membrane proteins in a lipid-like environment while keeping them stable and monodisperse in solution, so that they become accessible for biochemical, biophysical and structural studies. Methods to reconstitute membrane proteins into lipid nanoparticles provide a potential solution for this challenge. Current nanoparticle technologies that address this problem involve liposomes and high-density lipoprotein (rHDL) particles Membrane proteins are encoded by approx. 30% of all open reading frames'),
 (2,
  'PGC-1α in skeletal muscle induces broad genetic programs, including mitochondrial biogenesis and FA β-oxidation. '),
 (3,
  'Morphogenetic patterns observed in experimental conditions are, as a rule, not linear and explo

### Get article's abstract

The abstract of the article is going to be built the same as the body.

In [8]:
def get_abstract_list(input_dir):
    df =   spark.read \
                .format('com.databricks.spark.xml') \
                .options(rowTag='record')\
                .options(rowTag='abstract')\
                .load(input_dir)
    return  df.select("sec", "p")\
              .rdd\
              .map(lambda row: build_body(row['sec'], row["p"]))\
              .zipWithIndex()\
              .map(lambda record: (record[1], record[0]))\

get_abstract_list(input_dir).take(5)

[(0,
  'Expression of the oncogenic transcription factor MYC is disproportionately elevated in triple-negative breast cancer (TNBC) compared to estrogen, progesterone and human epidermal growth factor 2 receptor-positive (RP) breast tumors '),
 (1,
  'Membrane proteins are of outstanding importance in biology, drug discovery and vaccination. A common limiting factor in research and applications involving membrane proteins is the ability to solubilize and stabilize membrane proteins. Although detergents represent the major means for solubilizing membrane proteins, they are often associated with protein instability and poor applicability in structural and biophysical studies. Here, we present a novel lipoprotein nanoparticle system that allows for the reconstitution of membrane proteins into a lipid environment that is stabilized by a scaffold of Saposin proteins. We showcase the applicability of the method on two purified membrane protein complexes as well as the direct solubilization a

### Get article's categories

In [9]:
def get_article_categories(row):
    categories = []
    try: 
        for subj_group in row['subj-group']:
            if type(subj_group['subject']) == str:
                categories = [subj_group['subject']]
            elif type(subj_group['subject']) == list:
                for cat in subj_group['subject']:
                    categories.append(cat)
    except:
        return [row['subj-group']['subject']]
    return row['subj-group']
    

def get_categories_list(input_dir):
    df = spark.read \
            .format('com.databricks.spark.xml') \
            .options(rowTag='record')\
            .options(rowTag='metadata')\
            .options(rowTag='article')\
            .options(rowTag='front')\
            .options(rowTag='article-meta')\
            .load(input_dir)
    
    return  df.select("article-categories")\
              .rdd\
              .map(lambda row: get_article_categories(row['article-categories']))\
              .zipWithIndex()\
              .map(lambda record: (record[1], record[0]))\

get_categories_list(input_dir).take(100)

[(0, ['Article']),
 (1, ['Article']),
 (2, ['Article']),
 (3, ['Article']),
 (4, ['Original Paper']),
 (5, ['Review Paper']),
 (6, ['Original Paper']),
 (7, ['Original Paper']),
 (8, ['Original Paper']),
 (9, ['Original Paper']),
 (10, ['Original Paper']),
 (11, ['Original Paper']),
 (12, ['Original Paper']),
 (13, ['Original Paper']),
 (14, ['Original Paper']),
 (15, ['Original Paper']),
 (16, ['Original Paper']),
 (17, ['Original Paper']),
 (18, ['Articles']),
 (19, ['Neuropharmacology']),
 (20, ['Drug Discovery and Translational Medicine']),
 (21, ['Article']),
 (22, ['Original Article']),
 (23, ['Original Article']),
 (24, ['Original Article']),
 (25, ['Article']),
 (26, ['Article']),
 (27, ['Article']),
 (28, ['Article']),
 (29, ['Article']),
 (30, ['Article']),
 (31, ['Article']),
 (32, ['Article']),
 (33, ['Article']),
 (34, ['Article']),
 (35, ['Article']),
 (36, ['Article']),
 (37, ['Article']),
 (38, ['Article']),
 (39, ['Article']),
 (40, ['Article']),
 (41, ['Article']),
 (

### Get article's title

In [10]:
def get_titles_list(input_dir):
    df = spark.read \
            .format('com.databricks.spark.xml') \
            .options(rowTag='record')\
            .options(rowTag='metadata')\
            .options(rowTag='article')\
            .options(rowTag='front')\
            .options(rowTag='article-meta')\
            .options(rowTag='title-group')\
            .load(input_dir)
    
    return  df.select("article-title")\
              .rdd\
              .map(lambda row: row['article-title'])\
              .zipWithIndex()\
              .map(lambda record: (record[1], record[0]))\

get_titles_list(input_dir).take(5)

[(0,
  'Inhibition of fatty acid oxidation as a therapy for MYC-overexpressing triple-negative breast cancer'),
 (1, 'A novel lipoprotein nanoparticle system for membrane proteins'),
 (2,
  'A branched chain amino acid metabolite drives vascular transport of fat and causes insulin resistance'),
 (3, 'Scaling of morphogenetic patterns in reaction-diffusion systems'),
 (4, 'Association analysis of ')]

### Get number of pages

In [11]:
def get_page_counts(row):
    try:
        return str(row['page-count']._count)
    except:
        return "Not specified"
    
def get_number_of_pages_list(input_dir):
    df = spark.read \
            .format('com.databricks.spark.xml') \
            .options(rowTag='record')\
            .options(rowTag='metadata')\
            .options(rowTag='article')\
            .load(input_dir)
    try:
        return  df.select("counts")\
                  .rdd\
                  .map(lambda row: get_page_counts(row['counts']))\
                  .zipWithIndex()\
                  .map(lambda record: (record[1], record[0]))
    except:
        return df.select("_xmlns")\
                  .rdd\
                  .map(lambda row: "Not specified")\
                  .zipWithIndex()\
                  .map(lambda record: (record[1], record[0]))\

get_number_of_pages_list(input_dir).collect()

[(0, 'Not specified'),
 (1, 'Not specified'),
 (2, 'Not specified'),
 (3, 'Not specified'),
 (4, 'Not specified'),
 (5, 'Not specified'),
 (6, 'Not specified'),
 (7, 'Not specified'),
 (8, 'Not specified'),
 (9, 'Not specified'),
 (10, 'Not specified'),
 (11, 'Not specified'),
 (12, 'Not specified'),
 (13, 'Not specified'),
 (14, 'Not specified'),
 (15, 'Not specified'),
 (16, 'Not specified'),
 (17, 'Not specified'),
 (18, 'Not specified'),
 (19, '11'),
 (20, '16'),
 (21, 'Not specified'),
 (22, 'Not specified'),
 (23, 'Not specified'),
 (24, 'Not specified'),
 (25, 'Not specified'),
 (26, 'Not specified'),
 (27, 'Not specified'),
 (28, 'Not specified'),
 (29, 'Not specified'),
 (30, 'Not specified'),
 (31, 'Not specified'),
 (32, 'Not specified'),
 (33, 'Not specified'),
 (34, 'Not specified'),
 (35, 'Not specified'),
 (36, 'Not specified'),
 (37, 'Not specified'),
 (38, 'Not specified'),
 (39, 'Not specified'),
 (40, 'Not specified'),
 (41, 'Not specified'),
 (42, 'Not specified'),


### Get authors list

In [12]:
def get_authors(row):
    authors = []
    try:
        for contrib in row['contrib']:
            print(contrib)
            if contrib['_contrib-type'] == 'author':
                authors.append(contrib['name']['given-names'] + ' ' + contrib['name']['surname'])
        return authors
    except:
        return ['Not specified']
    
def get_authors_list(input_dir):
    df = spark.read \
            .format('com.databricks.spark.xml') \
            .options(rowTag='record')\
            .options(rowTag='metadata')\
            .options(rowTag='article')\
            .options(rowTag='front')\
            .options(rowTag='article-meta')\
            .load(input_dir)
    
    return  df.select("contrib-group")\
              .rdd\
              .map(lambda row: get_authors(row['contrib-group']))\
              .zipWithIndex()\
              .map(lambda record: (record[1], record[0]))\

get_authors_list(input_dir).take(5)

[(0,
  ['Roman Camarda',
   'Zhou Zhou',
   'Rebecca A. Kohnz',
   'Sanjeev Balakrishnan',
   'Celine Mahieu',
   'Brittany Anderton',
   'Henok Eyob',
   'Shingo Kajimura',
   'Aaron Tward',
   'Gregor Krings',
   'Daniel K. Nomura',
   'Andrei Goga']),
 (1,
  ['Jens Frauenfeld',
   'Robin Löving',
   'Jean-Paul Armache',
   'Andreas Sonnen',
   'Fatma Guettou',
   'Per Moberg',
   'Lin Zhu',
   'Caroline Jegerschöld',
   'Ali Flayhan',
   'John A.G. Briggs',
   'Henrik Garoff',
   'Christian Löw',
   'Yifan Cheng',
   'Pär Nordlund']),
 (2,
  ['Cholsoon Jang',
   'Sungwhan F Oh',
   'Shogo Wada',
   'Glenn C Rowe',
   'Laura Liu',
   'Mun Chun Chan',
   'James Rhee',
   'Atsushi Hoshino',
   'Boa Kim',
   'Ayon Ibrahim',
   'Luisa G Baca',
   'Esl Kim',
   'Chandra C Ghosh',
   'Samir M Parikh',
   'Aihua Jiang',
   'Qingwei Chu',
   'Daniel E. Forman',
   'Stewart H. Lecker',
   'Saikumari Krishnaiah',
   'Joshua D Rabinowitz',
   'Aalim M Weljie',
   'Joseph A Baur',
   'Dennis L K

### Get author's affiliation

In [13]:
def get_affiliations(row):
    affiliations = dict()
    affiliations_for_authors = []
    try:
        for aff in row['aff']:
            affiliations[aff['_id']] = (aff['country'], aff['institution'])
            
        for contrib in row['contrib']:
            if contrib['_contrib-type'] == 'author':
                ids_aff = [aff['_rid'] for aff in contrib['xref']]
                affiliations_for_authors.append([])
                
                for id_aff in ids_aff:
                    affiliations_for_authors[-1].append(affiliations[id_aff])
        return affiliations_for_authors
    except:
        return ['Not specified']
    
def get_affiliations_list(input_dir):
    df = spark.read \
            .format('com.databricks.spark.xml') \
            .options(rowTag='record')\
            .options(rowTag='metadata')\
            .options(rowTag='article')\
            .options(rowTag='front')\
            .options(rowTag='article-meta')\
            .load(input_dir)
    
    return  df.select("contrib-group")\
              .rdd\
              .map(lambda row: get_affiliations(row['contrib-group']))\
              .zipWithIndex()\
              .map(lambda record: (record[1], record[0]))\

get_affiliations_list(input_dir).collect()

[(0, ['Not specified']),
 (1, ['Not specified']),
 (2, ['Not specified']),
 (3, ['Not specified']),
 (4, ['Not specified']),
 (5, ['Not specified']),
 (6, ['Not specified']),
 (7, ['Not specified']),
 (8, ['Not specified']),
 (9, ['Not specified']),
 (10, ['Not specified']),
 (11, ['Not specified']),
 (12, ['Not specified']),
 (13, ['Not specified']),
 (14, ['Not specified']),
 (15, ['Not specified']),
 (16, ['Not specified']),
 (17, ['Not specified']),
 (18, ['Not specified']),
 (19, ['Not specified']),
 (20, ['Not specified']),
 (21, ['Not specified']),
 (22, ['Not specified']),
 (23, ['Not specified']),
 (24, ['Not specified']),
 (25, ['Not specified']),
 (26, ['Not specified']),
 (27, ['Not specified']),
 (28, ['Not specified']),
 (29, ['Not specified']),
 (30, ['Not specified']),
 (31, ['Not specified']),
 (32, ['Not specified']),
 (33, ['Not specified']),
 (34, ['Not specified']),
 (35, ['Not specified']),
 (36, ['Not specified']),
 (37, ['Not specified']),
 (38, ['Not specified'

### Get number of figures

In [14]:
def get_fig_no(row):
    try:
        if row['fig'] != None:
            if type(row['fig']) == list:
                return str(len(row['fig']))
            return "1"
        else:
            return 'Not specified'
    except:
        return 'Not specified'

def get_number_of_figures(input_dir):
    df = spark.read \
            .format('com.databricks.spark.xml') \
            .options(rowTag='record')\
            .load(input_dir)
    
    return  df.select('fig')\
              .rdd\
              .map(lambda row: get_fig_no(row))\
              .zipWithIndex()\
              .map(lambda record: (record[1], record[0]))\

get_number_of_figures(input_dir).collect()

[(0, 'Not specified'),
 (1, 'Not specified'),
 (2, 'Not specified'),
 (3, 'Not specified'),
 (4, 'Not specified'),
 (5, 'Not specified'),
 (6, 'Not specified'),
 (7, 'Not specified'),
 (8, 'Not specified'),
 (9, 'Not specified'),
 (10, 'Not specified'),
 (11, 'Not specified'),
 (12, 'Not specified'),
 (13, 'Not specified'),
 (14, 'Not specified'),
 (15, 'Not specified'),
 (16, 'Not specified'),
 (17, 'Not specified'),
 (18, 'Not specified'),
 (19, 'Not specified'),
 (20, 'Not specified'),
 (21, 'Not specified'),
 (22, 'Not specified'),
 (23, 'Not specified'),
 (24, 'Not specified'),
 (25, 'Not specified'),
 (26, 'Not specified'),
 (27, 'Not specified'),
 (28, 'Not specified'),
 (29, 'Not specified'),
 (30, 'Not specified'),
 (31, 'Not specified'),
 (32, 'Not specified'),
 (33, 'Not specified'),
 (34, 'Not specified'),
 (35, 'Not specified'),
 (36, 'Not specified'),
 (37, 'Not specified'),
 (38, 'Not specified'),
 (39, 'Not specified'),
 (40, 'Not specified'),
 (41, 'Not specified'),
 (

### Get final dataframe

In [17]:
def get_final_rdd(input_dir):
    return get_abstract_list(input_dir) \
                .join(get_paragraphs_list_from_body(input_dir)) \
                .join(get_sections_list(input_dir)) \
                .join(get_bodys_list(input_dir)) \
                .join(get_categories_list(input_dir)) \
                .join(get_titles_list(input_dir)) \
                .join(get_number_of_pages_list(input_dir)) \
                .join(get_authors_list(input_dir)) \
                .join(get_affiliations_list(input_dir)) \
                .join(get_number_of_figures(input_dir)) \
                .sortByKey()

def get_final_to_pandas(input_dir):
    return get_final_rdd(input_dir).map(lambda record: (record[1][0][0][0][0][0][0][0][0][0],
                                               record[1][0][0][0][0][0][0][0][0][1], 
                                               record[1][0][0][0][0][0][0][0][1], 
                                               record[1][0][0][0][0][0][0][1], 
                                               record[1][0][0][0][0][0][1],  
                                               record[1][0][0][0][0][1],
                                               record[1][0][0][0][1],
                                               record[1][0][0][1],
                                               record[1][0][1],
                                               record[1][1]))\
                          .toDF(["abstract", "paragraphs", "sections", "body", "categories", "title", "pages count",
                                "authors", "affiliations", "figures count"])

get_final_to_pandas(input_dir).toPandas().head(30)

Unnamed: 0,abstract,paragraphs,sections,body,categories,title,pages count,authors,affiliations,figures count
0,Expression of the oncogenic transcription fact...,[We hypothesized that MYC-dependent metabolic ...,[[For U-]],We hypothesized that MYC-dependent metabolic d...,[Article],Inhibition of fatty acid oxidation as a therap...,Not specified,"[Roman Camarda, Zhou Zhou, Rebecca A. Kohnz, S...",[Not specified],Not specified
1,Membrane proteins are of outstanding importanc...,[A major challenge is maintaining membrane pro...,[[Membrane proteins are encoded by approx. 30%...,A major challenge is maintaining membrane prot...,[Article],A novel lipoprotein nanoparticle system for me...,Not specified,"[Jens Frauenfeld, Robin Löving, Jean-Paul Arma...",[Not specified],Not specified
2,Epidemiological and experimental data implicat...,[PGC-1α in skeletal muscle induces broad genet...,[[C2C12 cells were grown in 10 cm dishes until...,PGC-1α in skeletal muscle induces broad geneti...,[Article],A branched chain amino acid metabolite drives ...,Not specified,"[Cholsoon Jang, Sungwhan F Oh, Shogo Wada, Gle...",[Not specified],Not specified
3,Development of multicellular organisms is comm...,[Morphogenetic patterns observed in experiment...,[[The greatest manifestation of biological dev...,Morphogenetic patterns observed in experimenta...,[Article],Scaling of morphogenetic patterns in reaction-...,Not specified,"[Manan’Iarivo Rasolonjanahary, Bakhtier Vasiev]",[Not specified],Not specified
4,,[The heritability of muscle strength and power...,[[Environmental and genetic factors influence ...,The heritability of muscle strength and power ...,[Original Paper],Association analysis of,Not specified,[Not specified],[Not specified],Not specified
5,The performance of professional strength and p...,[Excessive body weight gain because of an incr...,[[Regular physical activity has significant be...,Excessive body weight gain because of an incre...,[Review Paper],Genetic variants influencing effectiveness of ...,Not specified,"[A Leońska-Duniec, II Ahmetov, P Zmijewski]",[Not specified],Not specified
6,Frequent and regular physical activity has sig...,[There is growing evidence linking T to motiva...,"[[In sport, the testosterone (T) contribution ...",There is growing evidence linking T to motivat...,[Original Paper],Temporal associations between individual chang...,Not specified,"[BT Crewther, J Carruthers, LP Kilduff, CE San...",[Not specified],Not specified
7,To advance our understanding of the hormonal c...,"[Nevertheless, the mechanism responsible for t...",[[Effective energy metabolism and transport of...,"Nevertheless, the mechanism responsible for th...",[Original Paper],The effect of the competitive season in profes...,Not specified,"[A Dzedzej, W Ignatiuk, J Jaworska, T Grzywacz...",[Not specified],Not specified
8,"Following acute physical activity, blood hepci...","[Fortunately, the session rating of perceived ...",[[The seven-a-side version of rugby union has ...,"Fortunately, the session rating of perceived e...",[Original Paper],Multifactorial monitoring of training load in ...,Not specified,"[T Bouaziz, E Makni, P Passelergue, Z Tabka, G...",[Not specified],Not specified
9,The effectiveness of selected physiological an...,[Research has examined set volume experimental...,[[Resistance training (RT) is widely recognise...,Research has examined set volume experimentall...,[Original Paper],A comparison of low volume 'high-intensity-tra...,Not specified,"[J Giessing, B Eichmann, J Steele, J Fisher]",[Not specified],Not specified
