**PROJECT DESCRIPTION**

This project explores data related to Guillain-Barré Syndrome from two sources clinical research data and search engine results. Using PySpark, I perform data cleaning, transformation, and exploratory analysis to understand patterns and insights from each dataset independently.

**Fetching and Setting Up Data from Clinical and SerpApi APIs**

In [0]:
# Fetching Clinical Trials Data for Guillain-Barré Syndrome using Clinicaltrials.gov API

import requests

# URL and parameters
url = "https://clinicaltrials.gov/api/v2/studies"
params = {
    "query.cond": "Guillain-Barre",
    "pageSize": "100"
}

# To store all results
all_studies = []  

while True:
    # API request
    response = requests.get(url, params=params)
    data = response.json()
    
    # Append current page studies
    all_studies.extend(data.get("studies", []))
    
    # Check for next page
    next_token = data.get("nextPageToken")
    if not next_token:
        break
    
    # Set token for next page
    params["pageToken"] = next_token

# Store full JSON response
clinical_data = {"studies": all_studies}

# Display the results in JSON format
display(clinical_data)


{'studies': [{'protocolSection': {'identificationModule': {'nctId': 'NCT05212792',
     'orgStudyIdInfo': {'id': 'H21-03404'},
     'organization': {'fullName': 'University of British Columbia',
      'class': 'OTHER'},
     'briefTitle': 'Genomics and COVID-19 Vaccine Adverse Events',
     'officialTitle': 'Genomics of COVID-19 Vaccine-induced Adverse Events (Guillain-Barré Syndrome [GBS], Vaccine-induced Immune Thrombotic Thrombocytopenia [VITT]/thrombosis with Thrombocytopenia Syndrome [TTS], and Myocarditis/pericarditis)'},
    'statusModule': {'statusVerifiedDate': '2024-12',
     'overallStatus': 'RECRUITING',
     'expandedAccessInfo': {'hasExpandedAccess': False},
     'startDateStruct': {'date': '2022-06-24', 'type': 'ACTUAL'},
     'primaryCompletionDateStruct': {'date': '2025-12', 'type': 'ESTIMATED'},
     'completionDateStruct': {'date': '2025-12', 'type': 'ESTIMATED'},
     'studyFirstSubmitDate': '2022-01-25',
     'studyFirstSubmitQcDate': '2022-01-25',
     'studyFirst

In [0]:
# Fetching Google Search Results for Guillain-Barré Syndrome using SerpApi

#%pip install google-search-results==2.4.2

from serpapi import GoogleSearch
import json

api_key = "1e9e3313e442d407a5bb0f40096ff8633cfd38190089299db0626007b6c08632"

# To store all results
fetched_results_raw = []

page_size = 10
max_pages = 10

# Loop over pages to fetch results
for page in range(max_pages):
    params = {
        "api_key": api_key,
        "engine": "google",
        "q": "Guillain Barre Syndrome",
        "location": "India",
        "google_domain": "google.co.in",
        "gl": "in",
        "hl": "hi",
        "cr": "countryUS",
        "start": page * page_size
    }

    search = GoogleSearch(params)
    results = search.get_dict()

    fetched_results_raw += results.get("organic_results", [])

# Display the results in JSON format
print(json.dumps(fetched_results_raw, indent=2, ensure_ascii=False))


[
  {
    "position": 1,
    "title": "Guillain-Barre syndrome - Symptoms and causes",
    "link": "https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/symptoms-causes/syc-20362793",
    "redirect_link": "https://www.google.co.in/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/symptoms-causes/syc-20362793&ved=2ahUKEwiDqMejusONAxXK48kDHVAIC5IQFnoECBcQAQ",
    "displayed_link": "https://www.mayoclinic.org › ...",
    "favicon": "https://serpapi.com/searches/683595b9683863ff96684307/images/0d6cc8c82e5038e13c997e42de84add2ca730e8d348b329fa7508194cb2eb761.png",
    "date": "7 जून 2024",
    "snippet": "Guillain-Barre syndrome often begins with tingling and weakness starting in the feet and legs and spreading to the upper body and arms. Some ...",
    "snippet_highlighted_words": [
      "Guillain-Barre syndrome often begins with tingling and weakness"
    ],
    "source": "Mayo Clinic"
  },
  {
    "positio

**Convert Clinical Trials and SerpApi JSON Data to Pandas DataFrames (Normalization)**

In [0]:
# Convert Clinical Trials JSON Data to Pandas DataFrame for Analysis

import requests
import pandas as pd

url = "https://clinicaltrials.gov/api/v2/studies"
params = {
    "query.cond": "Guillain-Barre",
    "pageSize": "100"  
}

response = requests.get(url, params=params)
data = response.json()

studies = data.get("studies", [])

# Creating an empty list to collect trial records
trials_list = []

# Loop through each study
for study in studies:
    try:
        id_module = study.get('protocolSection', {}).get('identificationModule', {})
        status_module = study.get('protocolSection', {}).get('statusModule', {})

        trials_list.append({
            "nctId": id_module.get('nctId', ''),
            "briefTitle": id_module.get('briefTitle', ''),
            "overallStatus": status_module.get('overallStatus', ''),
            "hasResults": study.get('hasResults', False)
        })

    except Exception as e:
        print(f"Error processing a study: {e}")

# Convert to DataFrame
df = pd.DataFrame(trials_list)

# Display the DataFrame
display(df.head(100))






nctId,briefTitle,overallStatus,hasResults
NCT05212792,Genomics and COVID-19 Vaccine Adverse Events,RECRUITING,False
NCT04166357,Early Prediction of Respiratory and Autonomic Complications of GBS Using Neuromuscular Ultrasound,UNKNOWN,False
NCT04829526,Firm Observational Clinical Unicenter Study on Guillain Barré Syndrome,RECRUITING,False
NCT04303962,Efficacy of Intravenous Gamma Globulin on Guillain-Barre Syndrome,UNKNOWN,False
NCT00271791,Prednisone Treatment for Vestibular Neuronitis,COMPLETED,False
NCT01655394,Change of Nerve Conduction Properties in IVIg Dependent Neuropathies,UNKNOWN,False
NCT05324176,Diaphragm Thickness by Ultrasonography in Neurological Disorders,COMPLETED,False
NCT03943589,A Study of Imlifidase in Patients With Guillain-Barré Syndrome,COMPLETED,True
NCT04752566,A Study to Evaluate the Efficacy and Safety of Eculizumab in Guillain-Barré Syndrome,COMPLETED,True
NCT02493725,JET-GBS - Japanese Eculizumab Trial for GBS,COMPLETED,False


In [0]:
# Convert Google Search Results JSON Data to Pandas DataFrame for Analysis


import pandas as pd
print(f"Length of fetched_results_raw: {len(fetched_results_raw)}")


try:
    serpapi_info = []

    for result in fetched_results_raw:
        serpapi_info.append({
            'Position': result.get('position', ''),
            'Title': result.get('title', ''),
            'Link': result.get('link', ''),
            'Snippet': result.get('snippet', ''),
            'Source': result.get('source', ''),
            'Displayed Link': result.get('displayed_link', ''),
            'Sitelinks': str(result.get('sitelinks', ''))  
        })

    df_serpapi = pd.DataFrame(serpapi_info)
    display(df_serpapi.head(100))

except NameError:
    print("Variable 'fetched_results_raw' not found. Please run the fetching cell first.")




Length of fetched_results_raw: 99


Position,Title,Link,Snippet,Source,Displayed Link,Sitelinks
1,Guillain-Barre syndrome - Symptoms and causes,https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/symptoms-causes/syc-20362793,Guillain-Barre syndrome often begins with tingling and weakness starting in the feet and legs and spreading to the upper body and arms. Some ...,Mayo Clinic,https://www.mayoclinic.org › ...,
2,Guillain-Barré Syndrome (GBS): Symptoms & Treatment,https://my.clevelandclinic.org/health/diseases/15838-guillain-barre-syndrome,Guillain-Barré syndrome is a rare autoimmune condition in which your immune system attacks your peripheral nerves. It causes numbness and muscle weakness.,Cleveland Clinic,https://my.clevelandclinic.org › ...,
3,Guillain-Barré Syndrome,https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome,"Symptoms of Guillain-Barré syndrome · Difficulty with eye muscles and vision · Difficulty swallowing, speaking, or chewing · Pricking or pins ...",National Institute of Neurological Disorders and Stroke (.gov),https://www.ninds.nih.gov › ...,"{'inline': [{'title': 'What is Guillain-Barré...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-what-is-guillain-barr-syndrome-'}, {'title': 'How is Guillain-Barré...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-how-is-guillain-barr-syndrome-diagnosed-and-treated-'}, {'title': 'What are the latest updates on...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-what-are-the-latest-updates-on-guillain-barr-syndrome-'}]}"
4,Guillain–Barré syndrome,https://en.wikipedia.org/wiki/Guillain%E2%80%93Barr%C3%A9_syndrome,Guillain–Barré syndrome (GBS) is a rapid-onset muscle weakness caused by the immune system damaging the peripheral nervous system.,Wikipedia,https://en.wikipedia.org › ...,
5,Guillain–Barré syndrome,https://www.who.int/news-room/fact-sheets/detail/guillain-barr%C3%A9-syndrome,Guillain-Barré syndrome (GBS) is a rare condition in which a person's immune system attacks the peripheral nerves.,World Health Organization (WHO),https://www.who.int › ...,
6,Guillain-Barré Syndrome,https://www.hopkinsmedicine.org/health/conditions-and-diseases/guillainbarr-syndrome,Guillain-Barré syndrome (GBS) is a rare neurological disorder in which the body's immune system attacks the peripheral nervous system.,Johns Hopkins Medicine,https://www.hopkinsmedicine.org › ...,
7,Guillain-Barre syndrome - Diagnosis and treatment,https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/diagnosis-treatment/drc-20363006,"This rare autoimmune condition affects the nerves, causing weakness and tingling in the arms and legs that quickly spreads throughout the ...",Mayo Clinic,https://www.mayoclinic.org › ...,"{'inline': [{'title': 'Symptoms and causes', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/symptoms-causes/syc-20362793'}, {'title': 'Doctors and departments', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/doctors-departments/ddc-20363037'}, {'title': 'Care at Mayo Clinic', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/care-at-mayo-clinic/mac-20363049'}]}"
8,Guillain-Barre Syndrome,https://www.physio-pedia.com/Guillain-Barre_Syndrome,Guillain-Barré syndrome (GBS) is a condition characterised by the autoimmune destruction of the peripheral sensory system.,Physiopedia,https://www.physio-pedia.com › ...,"{'inline': [{'title': 'Case Study', 'link': 'https://www.physio-pedia.com/Case_Study:_Guillain-Barre_Syndrome_(Sub-Acute)'}, {'title': 'Acute Motor Axonal...', 'link': 'https://www.physio-pedia.com/Acute_Motor_Axonal_Neuropathy_(AMAN),_a_Variant_of_Guillain-Barre_Syndrome:_A_Case_Study'}, {'title': 'Edit', 'link': 'https://www.physio-pedia.com/Guillain-Barre_Syndrome?veaction=edit'}]}"
9,"Guillain-Barre Syndrome: Practice Essentials, Background, ...",https://emedicine.medscape.com/article/315632-overview,Guillain-Barré syndrome (GBS) can be described as a collection of clinical syndromes that manifests as an acute inflammatory ...,Medscape,https://emedicine.medscape.com › ...,
10,Guillain-Barré Syndrome | Campylobacter,https://www.cdc.gov/campylobacter/signs-symptoms/guillain-barre-syndrome.html,Guillain-Barré (Ghee-YAN Bah-RAY) syndrome happens when a person's immune system harms their nerves. This harm causes muscle weakness and ...,Centers for Disease Control and Prevention | CDC (.gov),https://www.cdc.gov › ...,


**Defining PySpark Schemas and Creating Separate DataFrames for Clinical Trials and Google Search Results**

In [0]:
# Defining PySpark Schema and Creating Clinical Trials DataFrame
from pyspark.sql.types import *

clinical_schema = StructType([
    StructField("nctId", StringType(), True),
    StructField("briefTitle", StringType(), True),
    StructField("overallStatus", StringType(), True),
    StructField("hasResults", BooleanType(), True)
])

df_clinical_spark = spark.createDataFrame(trials_list, schema=clinical_schema)
display(df_clinical_spark)


nctId,briefTitle,overallStatus,hasResults
NCT05212792,Genomics and COVID-19 Vaccine Adverse Events,RECRUITING,False
NCT04166357,Early Prediction of Respiratory and Autonomic Complications of GBS Using Neuromuscular Ultrasound,UNKNOWN,False
NCT04829526,Firm Observational Clinical Unicenter Study on Guillain Barré Syndrome,RECRUITING,False
NCT04303962,Efficacy of Intravenous Gamma Globulin on Guillain-Barre Syndrome,UNKNOWN,False
NCT00271791,Prednisone Treatment for Vestibular Neuronitis,COMPLETED,False
NCT01655394,Change of Nerve Conduction Properties in IVIg Dependent Neuropathies,UNKNOWN,False
NCT05324176,Diaphragm Thickness by Ultrasonography in Neurological Disorders,COMPLETED,False
NCT03943589,A Study of Imlifidase in Patients With Guillain-Barré Syndrome,COMPLETED,True
NCT04752566,A Study to Evaluate the Efficacy and Safety of Eculizumab in Guillain-Barré Syndrome,COMPLETED,True
NCT02493725,JET-GBS - Japanese Eculizumab Trial for GBS,COMPLETED,False


In [0]:
# Defining PySpark Schema and Creating Google Search Results DataFrame

from pyspark.sql.types import *

serpapi_schema = StructType([
    StructField("Position", IntegerType(), True),
    StructField("Title", StringType(), True),
    StructField("Link", StringType(), True),
    StructField("Snippet", StringType(), True),
    StructField("Source", StringType(), True),
    StructField("Displayed Link", StringType(), True),
    StructField("Sitelinks", StringType(), True)
])

df_serpapi_spark = spark.createDataFrame(df_serpapi, schema=serpapi_schema)
display(df_serpapi_spark)


Position,Title,Link,Snippet,Source,Displayed Link,Sitelinks
1,Guillain-Barre syndrome - Symptoms and causes,https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/symptoms-causes/syc-20362793,Guillain-Barre syndrome often begins with tingling and weakness starting in the feet and legs and spreading to the upper body and arms. Some ...,Mayo Clinic,https://www.mayoclinic.org › ...,
2,Guillain-Barré Syndrome (GBS): Symptoms & Treatment,https://my.clevelandclinic.org/health/diseases/15838-guillain-barre-syndrome,Guillain-Barré syndrome is a rare autoimmune condition in which your immune system attacks your peripheral nerves. It causes numbness and muscle weakness.,Cleveland Clinic,https://my.clevelandclinic.org › ...,
3,Guillain-Barré Syndrome,https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome,"Symptoms of Guillain-Barré syndrome · Difficulty with eye muscles and vision · Difficulty swallowing, speaking, or chewing · Pricking or pins ...",National Institute of Neurological Disorders and Stroke (.gov),https://www.ninds.nih.gov › ...,"{'inline': [{'title': 'What is Guillain-Barré...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-what-is-guillain-barr-syndrome-'}, {'title': 'How is Guillain-Barré...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-how-is-guillain-barr-syndrome-diagnosed-and-treated-'}, {'title': 'What are the latest updates on...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-what-are-the-latest-updates-on-guillain-barr-syndrome-'}]}"
4,Guillain–Barré syndrome,https://en.wikipedia.org/wiki/Guillain%E2%80%93Barr%C3%A9_syndrome,Guillain–Barré syndrome (GBS) is a rapid-onset muscle weakness caused by the immune system damaging the peripheral nervous system.,Wikipedia,https://en.wikipedia.org › ...,
5,Guillain–Barré syndrome,https://www.who.int/news-room/fact-sheets/detail/guillain-barr%C3%A9-syndrome,Guillain-Barré syndrome (GBS) is a rare condition in which a person's immune system attacks the peripheral nerves.,World Health Organization (WHO),https://www.who.int › ...,
6,Guillain-Barré Syndrome,https://www.hopkinsmedicine.org/health/conditions-and-diseases/guillainbarr-syndrome,Guillain-Barré syndrome (GBS) is a rare neurological disorder in which the body's immune system attacks the peripheral nervous system.,Johns Hopkins Medicine,https://www.hopkinsmedicine.org › ...,
7,Guillain-Barre syndrome - Diagnosis and treatment,https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/diagnosis-treatment/drc-20363006,"This rare autoimmune condition affects the nerves, causing weakness and tingling in the arms and legs that quickly spreads throughout the ...",Mayo Clinic,https://www.mayoclinic.org › ...,"{'inline': [{'title': 'Symptoms and causes', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/symptoms-causes/syc-20362793'}, {'title': 'Doctors and departments', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/doctors-departments/ddc-20363037'}, {'title': 'Care at Mayo Clinic', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/care-at-mayo-clinic/mac-20363049'}]}"
8,Guillain-Barre Syndrome,https://www.physio-pedia.com/Guillain-Barre_Syndrome,Guillain-Barré syndrome (GBS) is a condition characterised by the autoimmune destruction of the peripheral sensory system.,Physiopedia,https://www.physio-pedia.com › ...,"{'inline': [{'title': 'Case Study', 'link': 'https://www.physio-pedia.com/Case_Study:_Guillain-Barre_Syndrome_(Sub-Acute)'}, {'title': 'Acute Motor Axonal...', 'link': 'https://www.physio-pedia.com/Acute_Motor_Axonal_Neuropathy_(AMAN),_a_Variant_of_Guillain-Barre_Syndrome:_A_Case_Study'}, {'title': 'Edit', 'link': 'https://www.physio-pedia.com/Guillain-Barre_Syndrome?veaction=edit'}]}"
9,"Guillain-Barre Syndrome: Practice Essentials, Background, ...",https://emedicine.medscape.com/article/315632-overview,Guillain-Barré syndrome (GBS) can be described as a collection of clinical syndromes that manifests as an acute inflammatory ...,Medscape,https://emedicine.medscape.com › ...,
10,Guillain-Barré Syndrome | Campylobacter,https://www.cdc.gov/campylobacter/signs-symptoms/guillain-barre-syndrome.html,Guillain-Barré (Ghee-YAN Bah-RAY) syndrome happens when a person's immune system harms their nerves. This harm causes muscle weakness and ...,Centers for Disease Control and Prevention | CDC (.gov),https://www.cdc.gov › ...,


**Cleaning and Preparation of Clinical Trials Data**

In [0]:
# Clinical Trials Data Summary Statistics
display(df_clinical_spark.describe())

summary,nctId,briefTitle,overallStatus
count,76,76,76
mean,,,
stddev,,,
min,NCT00004833,A Clinical Study of ANX005 and IVIG in Subjects With Guillain Barré Syndrome (GBS),ACTIVE_NOT_RECRUITING
max,NCT06940908,sCD163 as a Potential Biomarker in Guillain- Barré Syndrome,WITHDRAWN


In [0]:
# Print Schema of Clinical Trials DataFrame
df_clinical_spark.printSchema()

root
 |-- nctId: string (nullable = true)
 |-- briefTitle: string (nullable = true)
 |-- overallStatus: string (nullable = true)
 |-- hasResults: boolean (nullable = true)



In [0]:
# Check for Null Values in Clinical Trials DataFrame Columns

import pyspark.sql.functions as p
df_clinical_spark.select([
    p.count(p.when(p.col(c).isNull(), c)).alias(c) for c in df_clinical_spark.columns
]).display()


nctId,briefTitle,overallStatus,hasResults
0,0,0,0


In [0]:
# Check for Duplicate Rows in Clinical Trials DataFrame

print("Duplicates in Clinical Data:")
df_clinical_spark.count() - df_clinical_spark.dropDuplicates().count()


Duplicates in Clinical Data:
Out[10]: 0

**Cleaning and Preparation of Google Search Results Data**

In [0]:
#Statistics Summary of Google Search Results Data
display(df_serpapi_spark.describe())

summary,Position,Title,Link,Snippet,Source,Displayed Link,Sitelinks
count,99.0,99,99,99,99,99,99
mean,5.454545454545454,,,,,,
stddev,2.865248243235092,,,,,,
min,1.0,A rare paralysis syndrome is spiking in India this year. ...,https://ameripharmaspecialty.com/myasthenia-gravis/guillain-barre-syndrome-vs-myasthenia-gravis/,"(Acute Idiopathic Polyneuritis; Acute Inflammatory Demyelinating Polyradiculoneuropathy) ... Guillain-Barré syndrome is an acute, usually rapidly progressive but ...",Advocate Health Care,20 लाख+ व्यू · 4 वर्ष पहले,
max,10.0,When Your Child Has Guillain-Barre Syndrome (GBS),https://www.youtube.com/watch?v=KUEunZYZgII,noun ... Note: The cause of Guillain-Barré syndrome is unknown but individuals often experience onset a few weeks after a respiratory or gastrointestinal illness.,health.com,https://www.yalemedicine.org › ...,"{'inline': [{'title': 'What is Guillain-Barré...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-what-is-guillain-barr-syndrome-'}, {'title': 'How is Guillain-Barré...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-how-is-guillain-barr-syndrome-diagnosed-and-treated-'}, {'title': 'What are the latest updates on...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-what-are-the-latest-updates-on-guillain-barr-syndrome-'}]}"


In [0]:
# Print Schema of Google Search Results DataFrame
df_serpapi_spark.printSchema()


root
 |-- Position: integer (nullable = true)
 |-- Title: string (nullable = true)
 |-- Link: string (nullable = true)
 |-- Snippet: string (nullable = true)
 |-- Source: string (nullable = true)
 |-- Displayed Link: string (nullable = true)
 |-- Sitelinks: string (nullable = true)



In [0]:
# Check for Null Values in Google Search Results DataFrame

import pyspark.sql.functions as p

df_serpapi_spark.select([
    p.count(p.when(p.col(c).isNull(), c)).alias(c) for c in df_serpapi_spark.columns
]).display()


Position,Title,Link,Snippet,Source,Displayed_Link,Sitelinks
0,0,0,0,0,0,0


In [0]:
# Check for Duplicates in Google Search Results DataFrame

print("Duplicates in SERP API Data:")
df_serpapi_spark.count() - df_serpapi_spark.dropDuplicates().count()


Duplicates in SERP API Data:
Out[14]: 0

In [0]:
# Identify and Display Duplicate Rows in Google Search Results DataFrame

from pyspark.sql import functions as p

# Group by all columns and count occurrences
duplicate_rows = (
    df_serpapi_spark
    .groupBy(df_serpapi_spark.columns)
    .count()
    .filter("count > 1")
)

# Show the duplicate rows
duplicate_rows.display(truncate=False)


Position,Title,Link,Snippet,Source,Displayed_Link,Sitelinks,count


In [0]:
# Remove Duplicate Rows from Google Search Results DataFrame and Report Count

before = df_serpapi_spark.count()
df_serpapi_spark = df_serpapi_spark.dropDuplicates()
after = df_serpapi_spark.count()

print(f"Removed {before - after} duplicate rows")


Removed 0 duplicate rows


In [0]:
# Fill Missing Values in 'Sitelinks' Column of Google Search Results DataFrame

df_serpapi_spark = df_serpapi_spark.na.fill({'Sitelinks': 'No Sitelinks'})

In [0]:
# Identify Duplicate Rows in Google Search Results DataFrame

df_serpapi_spark.groupBy(df_serpapi_spark.columns) \
    .count().filter("count > 1").display()


Position,Title,Link,Snippet,Source,Displayed_Link,Sitelinks,count


In [0]:
# Count Null Values in Each Column of Google Search Results DataFrame

import pyspark.sql.functions as p

df_serpapi_spark.select([
    p.count(p.when(p.col(c).isNull(), c)).alias(c) for c in df_serpapi_spark.columns
]).display()


Position,Title,Link,Snippet,Source,Displayed_Link,Sitelinks
0,0,0,0,0,0,0


In [0]:
# Clean column names for df_serpapi_spark
df_serpapi_spark = df_serpapi_spark.withColumnRenamed("Displayed Link", "Displayed_Link")

In [0]:
# Display Cleaned Google Search Results DataFrame

display(df_serpapi_spark)

Position,Title,Link,Snippet,Source,Displayed_Link,Sitelinks
6,Guillain-Barré Syndrome,https://www.hopkinsmedicine.org/health/conditions-and-diseases/guillainbarr-syndrome,Guillain-Barré syndrome (GBS) is a rare neurological disorder in which the body's immune system attacks the peripheral nervous system.,Johns Hopkins Medicine,https://www.hopkinsmedicine.org › ...,
1,Guillain-Barré Syndrome | Campylobacter,https://www.cdc.gov/campylobacter/signs-symptoms/guillain-barre-syndrome.html,Guillain-Barré (Ghee-YAN Bah-RAY) syndrome happens when a person's immune system harms their nerves. This harm causes muscle weakness and ...,Centers for Disease Control and Prevention | CDC (.gov),https://www.cdc.gov › ...,
5,Guillain–Barré syndrome,https://www.who.int/news-room/fact-sheets/detail/guillain-barr%C3%A9-syndrome,Guillain-Barré syndrome (GBS) is a rare condition in which a person's immune system attacks the peripheral nerves.,World Health Organization (WHO),https://www.who.int › ...,
2,Guillain-Barré Syndrome (GBS): Symptoms & Treatment,https://my.clevelandclinic.org/health/diseases/15838-guillain-barre-syndrome,Guillain-Barré syndrome is a rare autoimmune condition in which your immune system attacks your peripheral nerves. It causes numbness and muscle weakness.,Cleveland Clinic,https://my.clevelandclinic.org › ...,
1,Guillain-Barre syndrome - Symptoms and causes,https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/symptoms-causes/syc-20362793,Guillain-Barre syndrome often begins with tingling and weakness starting in the feet and legs and spreading to the upper body and arms. Some ...,Mayo Clinic,https://www.mayoclinic.org › ...,
7,Guillain-Barre syndrome - Diagnosis and treatment,https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/diagnosis-treatment/drc-20363006,"This rare autoimmune condition affects the nerves, causing weakness and tingling in the arms and legs that quickly spreads throughout the ...",Mayo Clinic,https://www.mayoclinic.org › ...,"{'inline': [{'title': 'Symptoms and causes', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/symptoms-causes/syc-20362793'}, {'title': 'Doctors and departments', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/doctors-departments/ddc-20363037'}, {'title': 'Care at Mayo Clinic', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/care-at-mayo-clinic/mac-20363049'}]}"
3,Guillain-Barré Syndrome,https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome,"Symptoms of Guillain-Barré syndrome · Difficulty with eye muscles and vision · Difficulty swallowing, speaking, or chewing · Pricking or pins ...",National Institute of Neurological Disorders and Stroke (.gov),https://www.ninds.nih.gov › ...,"{'inline': [{'title': 'What is Guillain-Barré...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-what-is-guillain-barr-syndrome-'}, {'title': 'How is Guillain-Barré...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-how-is-guillain-barr-syndrome-diagnosed-and-treated-'}, {'title': 'What are the latest updates on...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-what-are-the-latest-updates-on-guillain-barr-syndrome-'}]}"
4,Guillain–Barré syndrome,https://en.wikipedia.org/wiki/Guillain%E2%80%93Barr%C3%A9_syndrome,Guillain–Barré syndrome (GBS) is a rapid-onset muscle weakness caused by the immune system damaging the peripheral nervous system.,Wikipedia,https://en.wikipedia.org › ...,
8,Guillain-Barre Syndrome,https://www.physio-pedia.com/Guillain-Barre_Syndrome,Guillain-Barré syndrome (GBS) is a condition characterised by the autoimmune destruction of the peripheral sensory system.,Physiopedia,https://www.physio-pedia.com › ...,"{'inline': [{'title': 'Case Study', 'link': 'https://www.physio-pedia.com/Case_Study:_Guillain-Barre_Syndrome_(Sub-Acute)'}, {'title': 'Acute Motor Axonal...', 'link': 'https://www.physio-pedia.com/Acute_Motor_Axonal_Neuropathy_(AMAN),_a_Variant_of_Guillain-Barre_Syndrome:_A_Case_Study'}, {'title': 'Edit', 'link': 'https://www.physio-pedia.com/Guillain-Barre_Syndrome?veaction=edit'}]}"
2,Guillain-Barré syndrome Information - Mount Sinai,https://www.mountsinai.org/health-library/diseases-conditions/guillain-barr-syndrome,"GBS damages parts of nerves. This nerve damage causes tingling, muscle weakness, loss of balance, and paralysis. GBS most often affects the nerve covering ( ...",Mount Sinai,https://www.mountsinai.org › ...,


**Exploratory Data Analysis (EDA) of Clinical Trials and Google Search Data**

**Distribution of Study Status in Clinical Trials (overallStatus)**

In [0]:
# Understand the distribution of studies by status — e.g., completed, recruiting, withdrawn, etc.

df_clinical_spark.groupBy("overallStatus") \
    .count() \
    .orderBy("count", ascending=False) \
    .display(truncate=False)


overallStatus,count
COMPLETED,31
UNKNOWN,16
RECRUITING,12
NOT_YET_RECRUITING,7
WITHDRAWN,5
ENROLLING_BY_INVITATION,2
TERMINATED,2
ACTIVE_NOT_RECRUITING,1


**Percentage of studies with results vs without results**

In [0]:
# Quick insight into how many studies have published results — and whether most clinical trials on GBS report their findings
from pyspark.sql.functions import col

total_studies = df_clinical_spark.count()
results_group = df_clinical_spark.groupBy("hasResults").count()

results_group.withColumn(
    "percentage", 
    (col("count") / total_studies * 100).cast("decimal(5,2)")
).display()


hasResults,count,percentage
True,4,5.26
False,72,94.74


**Filter studies with results for focused analysis**

In [0]:
# Filter and preview studies that have published results for further analysis
df_with_results = df_clinical_spark.filter(col("hasResults") == True)
print(f"Studies with results: {df_with_results.count()}")
df_with_results.display(5, truncate=False)



Studies with results: 4


nctId,briefTitle,overallStatus,hasResults
NCT03943589,A Study of Imlifidase in Patients With Guillain-Barré Syndrome,COMPLETED,True
NCT04752566,A Study to Evaluate the Efficacy and Safety of Eculizumab in Guillain-Barré Syndrome,COMPLETED,True
NCT00411216,Recovery of Visual Acuity in People With Vestibular Deficits,COMPLETED,True
NCT04053452,Peripheral Nerve Ultrasound for Diagnosis and Prognosis of Guillain-Barre Syndrome,TERMINATED,True


**Exploratory Data Analysis of Google Search Results**

**Source-wise Result Count with Best Position**

In [0]:
#Source-wise count of Google Search results and their best positions

from pyspark.sql.functions import col, count, min

df_serpapi_spark.groupBy("Source") \
    .agg(
        count("*").alias("No_of_Results"),
        min("Position").alias("Best_Position")  # Lower means higher rank
    ) \
    .orderBy(col("No_of_Results").desc(), col("Best_Position")) \
    .display(truncate=False)


Source,No_of_Results,Best_Position
Centers for Disease Control and Prevention | CDC (.gov),3,1
Mayo Clinic,2,1
EyeWiki,2,1
adhikarilifeline.com,2,1
Emory Healthcare,2,1
The Lancet,2,1
Shirley Ryan AbilityLab,2,1
Wolters Kluwer,2,2
EMCrit Blog,2,3
Indian Academy of Pediatrics (IAP),2,4


**Which Titles appear most frequently in the top 3 positions, and which sources do they come from**

In [0]:
# Most frequent Titles in top 3 positions along with their Sources

from pyspark.sql.functions import col, count, collect_set

# Filter top 3 positions
top_positions_df = df_serpapi_spark.filter(col("Position") <= 3)

# Group by Title and collect the Sources
top_titles_with_sources = top_positions_df.groupBy("Title") \
    .agg(
        count("*").alias("Count"),
        collect_set("Source").alias("Top_Sources")
    ) \
    .orderBy(col("Count").desc())

top_titles_with_sources.display(truncate=False)



Title,Count,Top_Sources
Guillain-Barré Syndrome,4,"List(National Institute of Neurological Disorders and Stroke (.gov), Temple Health, Medcomic, Mayo Clinic Proceedings)"
Guillain-Barré Syndrome | McGovern Medical School,1,List(UTHealth Houston)
Guillain-Barre Syndrome,1,List(Aurora Health Care)
"Guillain-Barré syndrome: Types, Symptoms, Causes & ...",1,List(PACE Hospitals)
Guillain-Barré Syndrome (GBS) and Vaccines,1,List(Centers for Disease Control and Prevention | CDC (.gov))
Guillain Barre Syndrome (GBS),1,List(EMCrit Blog)
Listing: Guillain-Barré Syndrome Resources,1,List(Shirley Ryan AbilityLab)
Guillain–Barré syndrome outbreak in Pune - The Lancet,1,List(The Lancet)
India faces Guillain-Barré outbreak,1,List(aviNews)
Miller Fisher variant of Guillain-Barre Syndrome,1,List(EyeWiki)


**Topic Categorization of Google Search Snippets**

In [0]:
# Categorize search result snippets into medical topics like Symptoms, Causes, Treatment, etc. and count occurrences

from pyspark.sql.functions import when, col

df_tagged = df_serpapi_spark.withColumn("Topic",
    when(col("Snippet").rlike("(?i)symptom"), "Symptoms")
    .when(col("Snippet").rlike("(?i)cause|trigger"), "Causes")
    .when(col("Snippet").rlike("(?i)treat|therapy|manage"), "Treatment")
    .when(col("Snippet").rlike("(?i)recover|rehab"), "Recovery")
    .when(col("Snippet").rlike("(?i)diagnos"), "Diagnosis")
    .otherwise("Other")
)

df_tagged.groupBy("Topic").count().orderBy("count", ascending=False).display()


Topic,count
Other,54
Symptoms,20
Causes,14
Treatment,5
Diagnosis,3
Recovery,3


**Saving Clinical Trials and Google Search Results DataFrames as Spark Tables**

In [0]:
#Saving Clinical Trials DataFrame as a Spark Table
df_clinical_spark.write.mode("overwrite").saveAsTable("final_clinical_data")


In [0]:
%sql
select * from final_clinical_data

nctId,briefTitle,overallStatus,hasResults
NCT02029378,Inhibition of Complement Activation (Eculizumab) in Guillain-Barre Syndrome Study,UNKNOWN,False
NCT01306578,Intravenous Immunoglobulin (IVIG) Versus Plasma Exchange (PE) for Ventilated Children With Guillain Barre Syndrome (GBS),COMPLETED,False
NCT03710278,The Effectiveness and Safety of Human Lumbar Puncture Assist Device (LPat),COMPLETED,False
NCT03773328,A Clinical Trial of CK0801 (a New Drug) In Patients With Treatment-Resistant Guillain-Barré Syndrome (GBS),WITHDRAWN,False
NCT03840928,PatientSpot Formerly Known as ArthritisPower,RECRUITING,False
NCT00004833,Randomized Study of Plasmapheresis or Human Immunoglobulin Infusion in Childhood Guillain-Barre Syndrome,TERMINATED,False
NCT06200454,Predictive Value of Neuromuscular Ultrasound of Cranial Nerves in Guillain-Barré Syndrome,NOT_YET_RECRUITING,False
NCT04927598,Predictors and Prognostic Factors of Gullian Barrie Syndrome Outcome,COMPLETED,False
NCT05461898,RehabGBs: Rehabilitation in People With Guillain-Barré Syndrome,RECRUITING,False
NCT02582853,sCD163 as a Potential Biomarker in Guillain- Barré Syndrome,UNKNOWN,False


In [0]:
#Saving Google Search Results DataFrame as a Spark Table
df_serpapi_spark.write.mode("overwrite").saveAsTable("final_serpapi_data")


In [0]:
%sql
select * from final_serpapi_data

Position,Title,Link,Snippet,Source,Displayed_Link,Sitelinks
6,Guillain-Barré Syndrome,https://www.hopkinsmedicine.org/health/conditions-and-diseases/guillainbarr-syndrome,Guillain-Barré syndrome (GBS) is a rare neurological disorder in which the body's immune system attacks the peripheral nervous system.,Johns Hopkins Medicine,https://www.hopkinsmedicine.org › ...,
1,Guillain-Barré Syndrome | Campylobacter,https://www.cdc.gov/campylobacter/signs-symptoms/guillain-barre-syndrome.html,Guillain-Barré (Ghee-YAN Bah-RAY) syndrome happens when a person's immune system harms their nerves. This harm causes muscle weakness and ...,Centers for Disease Control and Prevention | CDC (.gov),https://www.cdc.gov › ...,
5,Guillain–Barré syndrome,https://www.who.int/news-room/fact-sheets/detail/guillain-barr%C3%A9-syndrome,Guillain-Barré syndrome (GBS) is a rare condition in which a person's immune system attacks the peripheral nerves.,World Health Organization (WHO),https://www.who.int › ...,
2,Guillain-Barré Syndrome (GBS): Symptoms & Treatment,https://my.clevelandclinic.org/health/diseases/15838-guillain-barre-syndrome,Guillain-Barré syndrome is a rare autoimmune condition in which your immune system attacks your peripheral nerves. It causes numbness and muscle weakness.,Cleveland Clinic,https://my.clevelandclinic.org › ...,
1,Guillain-Barre syndrome - Symptoms and causes,https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/symptoms-causes/syc-20362793,Guillain-Barre syndrome often begins with tingling and weakness starting in the feet and legs and spreading to the upper body and arms. Some ...,Mayo Clinic,https://www.mayoclinic.org › ...,
7,Guillain-Barre syndrome - Diagnosis and treatment,https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/diagnosis-treatment/drc-20363006,"This rare autoimmune condition affects the nerves, causing weakness and tingling in the arms and legs that quickly spreads throughout the ...",Mayo Clinic,https://www.mayoclinic.org › ...,"{'inline': [{'title': 'Symptoms and causes', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/symptoms-causes/syc-20362793'}, {'title': 'Doctors and departments', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/doctors-departments/ddc-20363037'}, {'title': 'Care at Mayo Clinic', 'link': 'https://www.mayoclinic.org/diseases-conditions/guillain-barre-syndrome/care-at-mayo-clinic/mac-20363049'}]}"
3,Guillain-Barré Syndrome,https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome,"Symptoms of Guillain-Barré syndrome · Difficulty with eye muscles and vision · Difficulty swallowing, speaking, or chewing · Pricking or pins ...",National Institute of Neurological Disorders and Stroke (.gov),https://www.ninds.nih.gov › ...,"{'inline': [{'title': 'What is Guillain-Barré...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-what-is-guillain-barr-syndrome-'}, {'title': 'How is Guillain-Barré...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-how-is-guillain-barr-syndrome-diagnosed-and-treated-'}, {'title': 'What are the latest updates on...', 'link': 'https://www.ninds.nih.gov/health-information/disorders/guillain-barre-syndrome#toc-what-are-the-latest-updates-on-guillain-barr-syndrome-'}]}"
4,Guillain–Barré syndrome,https://en.wikipedia.org/wiki/Guillain%E2%80%93Barr%C3%A9_syndrome,Guillain–Barré syndrome (GBS) is a rapid-onset muscle weakness caused by the immune system damaging the peripheral nervous system.,Wikipedia,https://en.wikipedia.org › ...,
8,Guillain-Barre Syndrome,https://www.physio-pedia.com/Guillain-Barre_Syndrome,Guillain-Barré syndrome (GBS) is a condition characterised by the autoimmune destruction of the peripheral sensory system.,Physiopedia,https://www.physio-pedia.com › ...,"{'inline': [{'title': 'Case Study', 'link': 'https://www.physio-pedia.com/Case_Study:_Guillain-Barre_Syndrome_(Sub-Acute)'}, {'title': 'Acute Motor Axonal...', 'link': 'https://www.physio-pedia.com/Acute_Motor_Axonal_Neuropathy_(AMAN),_a_Variant_of_Guillain-Barre_Syndrome:_A_Case_Study'}, {'title': 'Edit', 'link': 'https://www.physio-pedia.com/Guillain-Barre_Syndrome?veaction=edit'}]}"
2,Guillain-Barré syndrome Information - Mount Sinai,https://www.mountsinai.org/health-library/diseases-conditions/guillain-barr-syndrome,"GBS damages parts of nerves. This nerve damage causes tingling, muscle weakness, loss of balance, and paralysis. GBS most often affects the nerve covering ( ...",Mount Sinai,https://www.mountsinai.org › ...,


**Derived Results Summary**

**Clinical Trials Data Insights:**

- The majority of clinical trials related to Guillain-Barré Syndrome (GBS) are completed (31 studies), while a significant portion remains unknown or recruiting.

- Only about 5.26% of the studies have published results, indicating that most clinical trials on GBS have yet to report their findings.

- A filtered subset of studies with results was created for focused analysis, highlighting the available data for deeper examination.

**SERP API Search Results Insights:**

- The most frequent sources in Google search results include authoritative health organizations such as the Centers for Disease Control and Prevention (CDC), Mayo Clinic, and The Lancet, often appearing at top ranks.

- Titles related to Guillain-Barré Syndrome appearing in the top 3 search positions are mostly from reputable medical institutions, emphasizing the dominance of credible sources in search results.

- Search snippets are mainly focused on Symptoms, followed by Causes and Treatment, suggesting public interest and information availability centers on understanding and managing the condition.