In [16]:
import pandas as pd
import re
from fractions import Fraction


In [26]:
df = pd.read_csv('pubmed_data.csv')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None) 
df.head(3)  # View the first few rows

Unnamed: 0,PMID,Title,Abstract,Authors,Publication Date,DOI
0,25529590,WAIS-IV administration errors: effects of altered response requirements on Symbol Search and violation of standard surface-variety patterns on Block Design.,"This study utilized a sample of 50 college students to assess the possibility that responding to the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) Symbol Search subtest items with an ""x"" instead of a ""single slash mark"" would affect performance. A second sample of 50 college students was used to assess the impact on WAIS-IV Block Design performance of presenting all the items with only red surfaces facing up. The modified Symbol Search and Block Design administrations yielded mean scaled scores and raw scores that did not differ significantly from mean scores obtained with standard administrations. Findings should not be generalized beyond healthy, well-educated young adults.",Joseph J Ryan; Nicole Swopes-Willhite; Cassi Franklin; David S Kreiner,2013-12-05,10.1080/23279095.2013.828726
1,25529585,Apparently abnormal Wechsler Memory Scale index score patterns in the normal population.,"Interpretation of the Wechsler Memory Scale-Fourth Edition may involve examination of multiple memory index score contrasts and similar comparisons with Wechsler Adult Intelligence Scale-Fourth Edition ability indexes. Standardization sample data suggest that 15-point differences between any specific pair of index scores are relatively uncommon in normal individuals, but these base rates refer to a comparison between a single pair of indexes rather than multiple simultaneous comparisons among indexes. This study provides normative data for the occurrence of multiple index score differences calculated by using Monte Carlo simulations and validated against standardization data. Differences of 15 points between any two memory indexes or between memory and ability indexes occurred in 60% and 48% of the normative sample, respectively. Wechsler index score discrepancies are normally common and therefore not clinically meaningful when numerous such comparisons are made. Explicit prior interpretive hypotheses are necessary to reduce the number of index comparisons and associated false-positive conclusions. Monte Carlo simulation accurately predicts these false-positive rates.",Roman Marcus Carrasco; Josefine Grups; Brittney Evans; Edward Simco; Wiley Mittenberg,2013-11-12,10.1080/23279095.2013.816702
2,25284715,A longitudinal intergenerational analysis of executive functions during early childhood.,"Despite the importance of executive function (EF) in both clinical and educational contexts, the aetiology of individual differences in early childhood EF remains poorly understood. This study provides the first longitudinal intergenerational analysis of mother-child EF associations during early childhood. A group of children and their mothers (n = 62) completed age-appropriate EF tasks. Mother and child EFs were modestly correlated by 24 months of age, and this association was stable through 48 months. Importantly, maternal-child EF associations were still robust after controlling for verbal ability (potential indicator of verbal/crystallized intelligence) and maternal education (correlate of socio-economic status and verbal intelligence). Potential implications of these findings as well as underlying mechanisms of the maternal-child EF association (gene-environment interplay) are discussed.",Kimberly Cuevas; Kirby Deater-Deckard; Jungmeen Kim-Spoon; Zhe Wang; Katherine C Morasch; Martha Ann Bell,2013-10-28,10.1111/bjdp.12021


In [19]:
print(df.shape)
print(df.isnull().sum())  # Check for missing values
df.columns  # Check all column names 

(1620, 6)
PMID                  0
Title                 0
Abstract              0
Authors               1
Publication Date    383
DOI                 102
dtype: int64


Index(['PMID', 'Title', 'Abstract', 'Authors', 'Publication Date', 'DOI'], dtype='object')

In [20]:
# Replace NaN values with "Unknown"
df.fillna("Unknown", inplace=True)

In [27]:
# Clean data: remove special characters, convert text to lowercase
# df['Abstract'] = df['Abstract'].str.replace('[^a-zA-Z]', ' ').str.lower()
# df['Title'] = df['Title'].str.replace('[^a-zA-Z]', ' ').str.lower()
df['Abstract'] = df['Abstract'].str.replace('[^a-zA-Z0-9]', ' ')    #.str.lower()
df['Title'] = df['Title'].str.replace('[^a-zA-Z0-9]', ' ')          #.str.lower()

df.head(10)

Unnamed: 0,PMID,Title,Abstract,Authors,Publication Date,DOI
0,25529590,WAIS-IV administration errors: effects of altered response requirements on Symbol Search and violation of standard surface-variety patterns on Block Design.,"This study utilized a sample of 50 college students to assess the possibility that responding to the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) Symbol Search subtest items with an ""x"" instead of a ""single slash mark"" would affect performance. A second sample of 50 college students was used to assess the impact on WAIS-IV Block Design performance of presenting all the items with only red surfaces facing up. The modified Symbol Search and Block Design administrations yielded mean scaled scores and raw scores that did not differ significantly from mean scores obtained with standard administrations. Findings should not be generalized beyond healthy, well-educated young adults.",Joseph J Ryan; Nicole Swopes-Willhite; Cassi Franklin; David S Kreiner,2013-12-05,10.1080/23279095.2013.828726
1,25529585,Apparently abnormal Wechsler Memory Scale index score patterns in the normal population.,"Interpretation of the Wechsler Memory Scale-Fourth Edition may involve examination of multiple memory index score contrasts and similar comparisons with Wechsler Adult Intelligence Scale-Fourth Edition ability indexes. Standardization sample data suggest that 15-point differences between any specific pair of index scores are relatively uncommon in normal individuals, but these base rates refer to a comparison between a single pair of indexes rather than multiple simultaneous comparisons among indexes. This study provides normative data for the occurrence of multiple index score differences calculated by using Monte Carlo simulations and validated against standardization data. Differences of 15 points between any two memory indexes or between memory and ability indexes occurred in 60% and 48% of the normative sample, respectively. Wechsler index score discrepancies are normally common and therefore not clinically meaningful when numerous such comparisons are made. Explicit prior interpretive hypotheses are necessary to reduce the number of index comparisons and associated false-positive conclusions. Monte Carlo simulation accurately predicts these false-positive rates.",Roman Marcus Carrasco; Josefine Grups; Brittney Evans; Edward Simco; Wiley Mittenberg,2013-11-12,10.1080/23279095.2013.816702
2,25284715,A longitudinal intergenerational analysis of executive functions during early childhood.,"Despite the importance of executive function (EF) in both clinical and educational contexts, the aetiology of individual differences in early childhood EF remains poorly understood. This study provides the first longitudinal intergenerational analysis of mother-child EF associations during early childhood. A group of children and their mothers (n = 62) completed age-appropriate EF tasks. Mother and child EFs were modestly correlated by 24 months of age, and this association was stable through 48 months. Importantly, maternal-child EF associations were still robust after controlling for verbal ability (potential indicator of verbal/crystallized intelligence) and maternal education (correlate of socio-economic status and verbal intelligence). Potential implications of these findings as well as underlying mechanisms of the maternal-child EF association (gene-environment interplay) are discussed.",Kimberly Cuevas; Kirby Deater-Deckard; Jungmeen Kim-Spoon; Zhe Wang; Katherine C Morasch; Martha Ann Bell,2013-10-28,10.1111/bjdp.12021
3,25265311,The design organization test: further demonstration of reliability and validity as a brief measure of visuospatial ability.,"Neuropsychological assessments are frequently time-consuming and fatiguing for patients. Brief screening evaluations may reduce test duration and allow more efficient use of time by permitting greater attention toward neuropsychological domains showing probable deficits. The Design Organization Test (DOT) was initially developed as a 2-min paper-and-pencil alternative for the Block Design (BD) subtest of the Wechsler scales. Although initially validated for clinical neurologic patients, we sought to further establish the reliability and validity of this test in a healthy, more diverse population. Two alternate versions of the DOT and the Wechsler Abbreviated Scale of Intelligence (WASI) were administered to 61 healthy adult participants. The DOT showed high alternate forms reliability (r = .90-.92), and the two versions yielded equivalent levels of performance. The DOT was highly correlated with BD (r = .76-.79) and was significantly correlated with all subscales of the WASI. The DOT proved useful when used in lieu of BD in the calculation of WASI IQ scores. Findings support the reliability and validity of the DOT as a measure of visuospatial ability and suggest its potential worth as an efficient estimate of intellectual functioning in situations where lengthier tests may be inappropriate or unfeasible.",William D S Killgore; Hannah Gogel,2013-11-05,10.1080/23279095.2013.811671
4,25258654,A Practical Testing Battery to Measure Neurobehavioral Ability among Children with FASD.,"To determine a brief, practical battery of tests that discriminate between children with a fetal alcohol spectrum disorder (FASD) and unexposed controls. Children received dysmorphology exams, a targeted battery of cognitive and behavioral tests, and their mothers were interviewed about maternal risk factors. Children diagnosed with an FASD and children unexposed to alcohol prenatally were compared on cognitive/behavioral test results. A community in The Western Cape Province of South Africa. Sixty-one, first grade children with FASD and 52 matched normal controls. Statistical analyses of maternal drinking behavior and their child's test performance. Self-reported maternal drinking patterns before during and after pregnancy were used to confirm prenatal exposures to alcohol in the group of children diagnosed with FASD. With this sample of children diagnosed with FASD and completely unexposed controls, the adverse effects of maternal drinking on children's performance are reported. Results of the battery of standardized cognitive and behavioral tests indicate highly significant differences (p ≤ .001) between groups on: intelligence, perceptual motor, planning, and logical, spatial, short term, long term, and working memory abilities. Furthermore, a binary logistical regression model of only 3 specific cognitive and behavioral tests, including Digit Span A+B (Wald = 4.10), Absurd Situation (Wald = 3.57), and Word Association (Wald = 4.30) correctly classified 79.1% of the child participants as FASD or controls. A brief, practical set of tests can discriminate children with and without FASD and provide useful information for interventions for affected children.",Wendy O Kalberg; Philip A May; Jason Blankenship; David Buckley; J Phillip Gossage; Colleen M Adnams,,10.7895/ijadr.v2i3.83
5,25076796,"Spatial, Temporal and Spatio-Temporal Patterns of Maritime Piracy.","To examine patterns in the timing and location of incidents of maritime piracy to see whether, like many urban crimes, attacks cluster in space and time. Data for all incidents of maritime piracy worldwide recorded by the National Geospatial Intelligence Agency are analyzed using time-series models and methods originally developed to detect disease contagion. At the macro level, analyses suggest that incidents of pirate attacks are concentrated in five subregions of the earth's oceans and that the time series for these different subregions differ. At the micro level, analyses suggest that for the last 16 years (or more), pirate attacks appear to cluster in space and time suggesting that patterns are not static but are also not random. Much like other types of crime, pirate attacks cluster in space, and following an attack at one location the risk of others at the same location or nearby is temporarily elevated. The identification of such regularities has implications for the understanding of maritime piracy and for predicting the future locations of attacks.",Elio Marchione; Shane D Johnson,,10.1177/0022427812469113
6,25030251,Factors influencing post-traumatic stress in Korean forensic science investigators.,"The aim of this study was to understand factors that influence post-traumatic stress (PTS) in Korean forensic science investigators. A total of 111 forensic science investigators were recruited in Korea. PTS was measured using the tool modified by Choi (2001) from the original developed by Foa, Riggs, Dancu, and Rothbaum (1993) based on DSM-IV. Factors influencing PTS included demographic and job-related characteristics, emotional intelligence, and death anxiety. PTS scores were positively correlated with personality type, fatigue from work, and death anxiety. PTS scores were negatively correlated with length of career as a forensic science investigator and emotional intelligence. The factors that had the greatest influence on PTS were death anxiety, years spent as a forensic science investigator, personality type, emotional intelligence, fatigue, and homicide experience. The explanatory power of these six factors was 44.0%. Therefore, it is necessary to regularly evaluate the mental health of those who are vulnerable to PTS. Based on these results, various interventions could be implemented for promoting overall health of the forensic science investigators.",Yang-Sook Yoo; Ok-Hee Cho; Kyeong-Sook Cha; Yun-Jeong Boo,2013-07-18,10.1016/j.anr.2013.07.002
7,25007536,[The role of the jumping to conclusion bias in delusions formation].,"The results of many researches indicate that individuals with delusions reveal the reasoning bias. In probabilistic reasoning tasks they reveal hastiness in decision-making. The individuals with delusions request less information than non-deluded individuals, even if additional data is easily available. What is more, they also prove to be convinced to a greater extend of having made the right decision. This finding has been replicated by a number of studies. However, the previous researches have not confirmed the origins of 'jumping to conclusion' bias, and its role in the process of forming delusions has not been yet confirmed. The article in question contains the review of the results of the jumping to conclusion bias in people with delusions. It discusses the main hypotheses explaining the relations between the hasty decision making and the delusions formation. The article also deals with the specifics of 'jumping to conclusion' bias in case of individuals with delusions, as well as summarizes its relation to factors such as the level of intelligence or the intensity of delusion.",Jagoda Rózycka; Katarzyna Prochwicz,,
8,24990635,What my parents make me believe in learning: the role of filial piety in Hong Kong students' motivation and academic achievement.,"Chinese students are well-known for their academic excellence. However, studies that explore the underlying mechanism of how cultural factors relate to the motivational process and academic achievement of Chinese students have been limited. This study aimed to examine the role of filial piety in shaping Chinese students' theories of intelligence so as to obtain a clearer understanding of the process by which parent-child connectedness is linked to Chinese students' academic achievement. A sample of 312 university students in Hong Kong were assessed concerning their filial piety beliefs, theories of intelligence and academic achievement. Data were analysed using structural equation modelling. The results indicated that different filial piety beliefs relate to students' academic achievement by shaping different theories of intelligence. Reciprocal filial piety beliefs were found to facilitate an incremental view of intelligence, which in turn contributes to students' academic achievement. Authoritarian filial piety beliefs were shown to be associated with an entity view of intelligence, which consequently deteriorates students' academic achievement. Cultural views of motivational processes can shed light on how motivational beliefs are developed as a product of cultural or socialization processes, which, in turn, contribute to students' academic success.",Wei-Wen Chen; Yi-Lee Wong,2013-11-20,10.1002/ijop.12014
9,24971291,"Spiritual intelligence, resiliency, and withdrawal time in clients of methadone maintenance treatment.","Reports show an increasing interest in spirituality. It has been revealed that people with spiritual tendencies, can better deal with a trauma, manage the stressful situations, and have greater improvement in their health condition. Our aim was to examine the relationship between spiritual intelligence and resiliency, and the relation of these two variables with the withdrawal time of individuals treated with methadone. This research was conducted on patients referred to the addiction center of Baharan Psychiatric Hospital in Zahedan, Iran. Our sample included 100 referrals; they were provided with questionnaires and asked to answer them honestly. King's spiritual intelligence questionnaire and resilience questionnaires were used. There were significant positive correlations between resiliency and scores of spiritual intelligence as well as with subscales of spiritual intelligence. In addition, there were significant positive correlations between withdrawal time and scores of spiritual intelligence as well as with subscales of spiritual intelligence as well as with resiliency. Relationships between the spiritual intelligence and resiliency parameters with withdrawal time show that these parameters can have a role in relapse protection among addicted people.",Behnaz Shahbakhsh; Sedighe Moallemi,2013-12-22,10.5812/ijhrba.11308


In [28]:
# Save file
df.to_csv('processed_data.csv', index=False)