
# 📓 CV PDF → CSV Extraction (Tutorial Version)

This notebook extracts structured information from **your 3 CV PDFs** and saves the results into a single CSV.  
It uses **pdfplumber** (no OCR) and includes enhanced parsing logic for LinkedIn/GitHub mentions that **don't** include full URLs.

**Output columns:**  
`["file_name","name","email","phone","linkedin","github","portfolio","other_links","raw_text"]`


## 1) Install & Import Libraries

In [24]:

%pip install -q pdfplumber pandas

import re
from pathlib import Path
import pandas as pd
import pdfplumber
from IPython.display import display, HTML


Note: you may need to restart the kernel to use updated packages.


## 2) Configuration (ONLY your 3 CV PDFs)

In [25]:
from pathlib import Path

# 👇 Your local folder
INPUT_DIR = Path(r"C:\Users\bbuser\Downloads")

# 👇 Your CV filenames
PDF_FILES = [
    "Duaa Hilal Al-Hashmi CV 2025.pdf",
    "Nusiba_Alnabhani_cv (1).pdf",
    "Janna Khalid Al Balushi Resume 14-09-25.pdf",
]

pdfs = [INPUT_DIR / f for f in PDF_FILES if (INPUT_DIR / f).exists()]
print(f"Found {len(pdfs)} CV PDFs:", [p.name for p in pdfs])


# Output CSV path
OUTPUT_CSV = str(INPUT_DIR / "cv_data_extracted_final.csv")

# Options
SAVE_RAW_TEXT = True
VERBOSE = True


Found 3 CV PDFs: ['Duaa Hilal Al-Hashmi CV 2025.pdf', 'Nusiba_Alnabhani_cv (1).pdf', 'Janna Khalid Al Balushi Resume 14-09-25.pdf']


## 3) Helper Functions (regex + extractors)

In [26]:
# Regex patterns
EMAIL_RE = re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b")
PHONE_RE = re.compile(r"""
    (?<!\w)(?:\+?\d{1,3}[\s\-\.]?)?(?:\(?\d{2,4}\)?[\s\-\.]?)\d{3,4}[\s\-\.]?\d{3,4}(?!\w)
""", re.VERBOSE)
LINK_RE = re.compile(r"https?://[^\s\)\]]+", re.IGNORECASE)

def clean_line(s: str) -> str:
    return re.sub(r"\s+", " ", s).strip()

def extract_text_from_pdf(path: str) -> str:
    """Extract text using pdfplumber (robust for typical CV formatting)."""
    text = ""
    with pdfplumber.open(path) as pdf:
        for page in pdf.pages:
            page_text = page.extract_text() or ""
            text += page_text + "\n"
    return text

def guess_name_from_top(text: str, max_chars: int = 80) -> str:
    """Heuristic to guess full name from the top section of the CV."""
    lines = [clean_line(x) for x in text.splitlines() if clean_line(x)]
    for line in lines[:15]:
        if len(line) > max_chars:
            continue
        if EMAIL_RE.search(line) or LINK_RE.search(line) or PHONE_RE.search(line):
            continue
        # All-caps header like "DUAA HILAL AL-HASHMI"
        if line.isupper() and 1 <= len(line.split()) <= 5:
            return line.title()
        # Mostly letters & reasonable number of words
        letters_ratio = sum(ch.isalpha() or ch in " -.\'’" for ch in line) / max(1, len(line))
        words = line.split()
        if letters_ratio > 0.8 and 1 <= len(words) <= 5:
            return " ".join(w.capitalize() for w in words)
    return ""

def extract_linkedin(text: str, links: list[str]) -> str:
    # Prefer actual URLs first
    for l in links:
        if "linkedin.com" in l.lower():
            return l.strip()
    # Then pattern-based username capture
    m = re.search(r"(?i)linkedin\s*[:\-]?\s*([A-Za-z0-9\-\_\/\.]+)", text)
    if m:
        username = m.group(1).strip().strip('.').strip('/')
        if not username.startswith("http"):
            return f"https://www.linkedin.com/in/{username}"
    # If explicitly says LinkedIn Profile but no link
    if re.search(r"(?i)linkedin profile", text):
        return "LinkedIn Profile (no URL found)"
    return ""

def extract_github(text: str, links: list[str]) -> str:
    # Prefer actual URLs first
    for l in links:
        if "github.com" in l.lower():
            return l.strip()
    # Then pattern-based username capture
    m = re.search(r"(?i)github\s*[:\-]?\s*([A-Za-z0-9\-\_\/\.]+)", text)
    if m:
        username = m.group(1).strip().strip('.').strip('/')
        if not username.startswith("http"):
            return f"https://github.com/{username}"
    # If explicitly says GitHub Profile but no link
    if re.search(r"(?i)github profile", text):
        return "GitHub Profile (no URL found)"
    return ""


## 4) Sanity Check: Preview one CV's raw text (first 800 chars)

In [27]:

if pdfs:
    sample_text = extract_text_from_pdf(str(pdfs[0]))
    print(sample_text[:800])
else:
    print("No CV PDFs found. Check filenames and INPUT_DIR.")


Duaa Hilal Al-Hashmi
Al Batinah North | P: +968 91799668 | Duaahil44@gmail.com
EDUCATION
CODELINE
Data Science
Graduation Date: October 2025
SAUDI DIGITAL ACADEMY
FullStack Java Web Development Bootcamp
Graduation Date: December 2023
UNIVERSITY OF TECHNOLOGY AND APPLIED SCIENCES
Bachelor of Electronics and Telecommunications Engineering
Cumulative GPA: 3.2/4.0, Graduation Date: August 2022
EXPERIENCE
Telecommunications Regulatory Authority.
Muscat —Internship
● Completed the Nafath Training Program, acquiring expertise in programming languages (Java, JavaScript, Python, SQL),
frameworks (Spring Boot, Swagger-UI, Angular), and software design principles (RESTful APIs, SOLID)
● Applied skills in debugging tools (Postman, IntelliJ Debugger), IDEs (IntelliJ IDEA, Visual Studio Code), and compr


## 5) Extract Data From All 3 CV PDFs

In [28]:
rows = []
for p in pdfs:
    text = extract_text_from_pdf(str(p))

    if SAVE_RAW_TEXT:
        try:
            (p.parent / (p.stem + ".txt")).write_text(text, encoding="utf-8", errors="ignore")
        except Exception as e:
            if VERBOSE: print(f"[warn] cannot save raw text for {p.name}: {e}")

    emails = EMAIL_RE.findall(text)
    phones = PHONE_RE.findall(text)
    links = LINK_RE.findall(text)

    linkedin = extract_linkedin(text, links)
    github = extract_github(text, links)
    portfolio = next((u for u in links if any(k in u.lower() for k in ["portfolio","behance.net","dribbble.com","notion.site","about.me"])), "")
    other_links = [re.sub(r"[\)\]]+$", "", u) for u in links if u not in {linkedin, github, portfolio}]

    m = re.search(r"(?i)\bname\s*[:\-]\s*([^\n]+)", text)
    name = clean_line(m.group(1)) if m else guess_name_from_top(text)

    rows.append({
        "file_name": p.name,
        "name": name,
        "email": emails[0] if emails else "",
        "phone": phones[0] if phones else "",
        "linkedin": linkedin,
        "github": github,
        "portfolio": portfolio,
        "other_links": "; ".join(other_links[:8]),
        "raw_text": text
    })

df = pd.DataFrame(rows, columns=["file_name","name","email","phone","linkedin","github","portfolio","other_links","raw_text"])
print(f"Extracted {len(df)} row(s).")
df.head(10)


Could get FontBBox from font descriptor because None cannot be parsed as 4 floats


Extracted 3 row(s).


Unnamed: 0,file_name,name,email,phone,linkedin,github,portfolio,other_links,raw_text
0,Duaa Hilal Al-Hashmi CV 2025.pdf,Duaa Hilal Al-hashmi,Duaahil44@gmail.com,+968 91799668,,https://github.com/Duaa3/Hotel-Managements-Project-Frontend,,https://github.com/Duaa3/Hotel-Managements-Project-Backend-; https://www.researchgate.net/publication/369452982_Date_fruit_classification_and_sorting_system_using_Artificial_; https://ijrar.org/viewfulltext.php?&p_id=IJRAR21D2332,"Duaa Hilal Al-Hashmi\nAl Batinah North | P: +968 91799668 | Duaahil44@gmail.com\nEDUCATION\nCODELINE\nData Science\nGraduation Date: October 2025\nSAUDI DIGITAL ACADEMY\nFullStack Java Web Development Bootcamp\nGraduation Date: December 2023\nUNIVERSITY OF TECHNOLOGY AND APPLIED SCIENCES\nBachelor of Electronics and Telecommunications Engineering\nCumulative GPA: 3.2/4.0, Graduation Date: August 2022\nEXPERIENCE\nTelecommunications Regulatory Authority.\nMuscat —Internship\n● Completed the Nafath Training Program, acquiring expertise in programming languages (Java, JavaScript, Python, SQL),\nframeworks (Spring Boot, Swagger-UI, Angular), and software design principles (RESTful APIs, SOLID)\n● Applied skills in debugging tools (Postman, IntelliJ Debugger), IDEs (IntelliJ IDEA, Visual Studio Code), and comprehensive\nconcepts (authentication, network protocols, web security) within a structured 3-tier application framework\n● Gained experience in version control systems such as Git and GitHub for collaborative code management, adeptly employing\nAgile methodologies like Scrum, Trello, and Slack for efficient project organization and communication\nUNIVERSITY OF TECHNOLOGY AND APPLIED SCIENCES\nAl Musanna— Research Assistant\nNovember 2021- August 2022\n● Drove the design and development of an IoT-based system for post-harvest data collection in Oman\n● Implemented image processing and AI techniques; successful collaboration with a diverse team of researchers improved the\nsystem, contributing to socioeconomic development\n● Played an integral role in co-authoring and publishing two highly impactful research papers\nSAID BIN SULTAN NAVAL BASE, ROYAL NAVY OF OMAN\nAl Musanna —Internship\nJune 2022- July 2022\n● Engaged in a comprehensive 40-day internship at Said Bin Sultan Naval Base, providing input and contributing to\nworkshops aimed at enhancing health and safety practices\n● Optimized radar systems, improving intercom efficiency, advancing ESM and ECM capabilities\n● Streamlined external communication protocols, enhanced network, and software performance implemented PCB\nenhancements, refined naval calibration procedures, and drove innovation in underwater systems\nPROJECTS\nSMART OFFICE PLANNER\nJuly 2025\n● Hybrid‑work optimization prototype: rule‑based + ML scheduling, seat‑map visualization, capacity limits, preference\nmatching.\n● Built with Python, Streamlit UI, FastAPI services, and Supabase/Postgres; included simple auth and CRUD.\nHOTEL MANAGEMENT SYSTEM\nNovember 2023\n● Engineered and deployed a robust RESTful API for a Hotel Management System, elevating data retrieval and processing\nefficiency by 50% for enhanced user experience and system performance\n● Enabled seamless hotel and room creation, guest registration, reservation booking, and staff management, fostering a 30%\nimprovement in operational efficiency\n● Leveraged Java, Spring Boot, Spring Security, and MySQL for system development, resulting in a 20% decrease in feature\ndevelopment time and quicker deployment\n● GitHub Repository For the Frontend: Link: https://github.com/Duaa3/Hotel-Managements-Project-Frontend\n● GitHub Repository For the Backend: Link: https://github.com/Duaa3/Hotel-Managements-Project-Backend-\nDEEP LEARNING BASED DATE FRUIT CLASSIFICATION SYSTEM\nApril 2022\n● Collaborated with a team of 3 students to develop a date classification method using teachable machine and MIT App for\nfruit identification\n● Implemented JavaScript and deep learning techniques to analyze data from Saudi Arabian and Omani farms, achieving 95%\naccuracy; enabled data-driven decision-making for optimizing agricultural productivity and resource allocation\nPUBLICATION\nRESUME WORDED FINANCE SOCIETY\nDecember 2022\n● [Link:https://www.researchgate.net/publication/369452982_Date_fruit_classification_and_sorting_system_using_Artificial_\nIntelligence_ Application_of_Transfer_Learning]\nMULTICLASS FRUIT CLASSIFICATION MODEL USING DEEP LEARNING\nDecember 2021\n● [Link: https://ijrar.org/viewfulltext.php?&p_id=IJRAR21D2332]\nADDITIONAL\nCertifications & Training:\nAI Empowered Youth Program — Omantel 2025\nIAIDL Essentials — International AI Driving License (IDEL), 2025\nHuawei HCIA‑AI Training (Oman), 2025\nHuawei Advanced Training: (China) AI Internship Professional Practice 2025\nHCIA — Cloud Computing (2025)\nAwards:\nFirst Place — CODELINE Hackathon 2025 (Smart Office Planner)\nHonor Student, University of Technology & Applied Sciences\nFirst Prize — Seventh Engineering Week (May 2022), UTAS Al Musanna\nBEST PAPER — WCASET 2022 (IFERP)\nTechnical Skills\n● Programming & Data Science: Python, NumPy, Pandas, scikit-learn, XGBoost, TensorFlow/Keras (foundations),\nBayesian networks (pgmpy), statistical modeling, class imbalance handling.\n● Machine Learning & Analytics: Exploratory data analysis (EDA), data cleaning, feature engineering, performance\nevaluation (Accuracy, F1-score, ROC-AUC, RMSE, MAE, R²).\n● Computer Vision: OpenCV, image preprocessing and enhancement (histogram equalization, contrast adjustment).\n● Backend & APIs: FastAPI, Uvicorn, RESTful design, authentication, Supabase (Postgres, storage).\n● Web Development: Java, Spring Boot, Angular, Bootstrap; full-stack application development with CI practices.\n● Databases & Data Tools: SQL (PostgreSQL, MySQL), Supabase, web scraping (BeautifulSoup, Selenium).\n● DevOps & Infrastructure: Docker (containerization), Linux CLI, Git/GitHub version control, basic CI/CD pipelines.\n● Visualization & Reporting: Matplotlib, Plotly, Jupyter Notebooks, Markdown documentation.\n● Languages: Fluent in Arabic & English.\n"
1,Nusiba_Alnabhani_cv (1).pdf,Bachelor’s In Information Security,nusibaalnabhani@gmail.com,+968-98995282,https://www.linkedin.com/in/com/in/nusiba-alnabhani,,,,"N U S I B A M U S L I M J U M M A A L N A B H A N I\nUniversity of Technology and Applied Sciences, 2019-2023\nBachelor’s in information security\nMuscat, Oman | +968-98995282 | nusibaalnabhani@gmail.com |linkedin.com/in/nusiba-alnabhani\nPROFILE\nInformation Security graduate with hands on training in cybersecurity, data analysis, and artificial intelligence. Experienced\nin applying technical skills to real world projects, including data preprocessing, network setup, and customer protection\nservices. Eager to contribute to organizational growth by leveraging technical expertise and practical experience.\nEXPERIENCES\nOMANTAL - CODELINE\nJanuary, 2025 (present)\nConducted exploratory data analysis (EDA) on datasets with 50k+ entries using Python (Pandas, NumPy, Matplotlib).\nApplied feature engineering techniques to improve ML model performance by 20%.\nTrained and evaluated machine learning models using scikit-learn, achieving consistent accuracy improvements.\nGained hands-on experience in SQL for data manipulation and automation tasks.\nEXCEED IT SERVICES AND TRAINING COMPANY\nMarch, 2024 - August, 2024\nCompleted 6 intensive courses in Artificial Intelligence, Python, NLP, and Computer Vision.\nBuilt and tested machine learning models on sample datasets, achieving 85% accuracy in classification tasks.\nApplied Azure ML for model deployment and gained practical experience with real-world AI workflows.\nTHE PUBLIC AUTHORITY FOR CUSTOMER PROTECTION\nNovember,2023 - December, 2023\nDrafted and processed 100+ memos, letters, and official communications.\nRegistered and tracked 200+ incoming and outgoing documents using archiving systems.\nDesigned a document classification and retention system to improve record accessibility by 30%.\nInvestigated and reported on customer complaints, contributing to 10+ resolved cases.\nCreated press releases and promotional posters using Canva, enhancing media visibility.\nMINISTY OF INFORMATION\nJun, 2022 - Jul, 2022\nSupported IT team in activating Windows licenses and troubleshooting systems.\nInstalled antivirus software across 50+ devices, improving endpoint security.\nConnected client and server machines to secure networks, reducing downtime by 15%.\nPracticed software development concepts using Python, Java, and Flutter.\nCERTIFICATIONS\nInternational AI Driving License Certificate, 2025\nMicrosoft Azure AI Fundamentals (AI-900), 2024\nScrum Fundamentals Certified, 2024\nPhotoshop from Scratch Workshop, 2023\nInternet of Things and its Applications, 2022\nCyber Security Certified, 2022\nTECHNICAL SKILLS PERSONAL SKILLS\nMicrosoft Office Effective communication with cross-functional\nCisco Packet Tracer teams.\nLinux (Kali) Report writing and documentation.\nProgramming Languages: Python, Java Adaptability in fast-paced technical\nenvironments.\nData-driven decision making.\nACHIEVEMENTS\n“Photoshop from scratch” workshop 2023\nParticipation in the membership of the Student Advisory Council.\nParticipate in the activities of the ""Ghaith"" volunteer team.\nParticipation in the youth leadership camp in October. 2022\nParticipated in the fifth debate competition organized by the university in October.\n""Leaders"" training program organized by the university branch September.\nParticipation in the Rahal Hackathon for tourism projects, which was implemented by the ""Ghobsha"" team in August.\nParticipated in the fourth debate club forum in March.\n""3D printer"" workshop in February.\n"
2,Janna Khalid Al Balushi Resume 14-09-25.pdf,Janna Khalid Al Balushi,janna.balushi@outlook.com,+968 79141663,https://www.linkedin.com/in/Profile,https://github.com/Profile,,,"Janna Khalid Al Balushi\njanna.balushi@outlook.com | +968 79141663 | Oman, Muscat | LinkedIn Profile | GitHub Profile\nProfessional Summary\nAspiring Data Scientist with hands-on experience as a Data Analyst on a national project. Passionate about bridging the\ngap between business needs and technological solutions. My commitment to continuously learning drives me to constantly\nhone my skills in agile methodologies, data science and AI/ML, empowering me to harness cutting-edge solutions that\ndrive impactful business outcomes\nEducational Background\nBachelor of Technology, Electronics & Telecommunications Engineering 2018 – 2024\nUniversity of Technology & Applied Sciences (UTAS) Muscat, Oman\nGPA: 3.13/4\nCapstone Project: School Bus Security System\n• Led the team towards achieving projects’ objectives.\n• Implemented and troubleshooted an IoT-based circuit.\n• Participated in the Traffic Awareness Week Event organized by the Oman Royal Army.\nProfessional Experience\nAI Empowered Youth Associate, Omantel Jan 2025 – Present\n• Assigned as Scrum Master leading the team in agile scrum practices, facilitating meetings, and enhancing team\ncollaboration.\n• Pursuing a comprehensive understanding and application of data science concepts.\n• Gaining hands-on experience in programming with Python and SQL for data manipulation.\n• Familiar with the lifecycle of data science, from data collection to model evaluation and deployment.\n• Conducted data preprocessing and exploratory data analysis (EDA) using Python libraries (Pandas, NumPy, Matplotlib,\nSeaborn) to clean, transform, and visualize datasets.\n• Applied feature engineering techniques such as encoding, scaling, and interaction term creation to optimize model\ninputs and improve performance.\n• Trained and evaluated machine learning models using scikit-learn.\nData Leaders Associate, Omantel Academy Jun 2024 – Dec 2024\n• Proven proficiency in using MS Excel for data analysis.\n• Developed expertise in data cleansing, data analysis, and data Visualization.\n• Contributed to The National System for Planning, Monitoring, and Evaluation project in collaboration with General\nSecretariat of the Council of Minsters and Omantel.\nPeople Operations Trainee, Omantel Mar 2024 – May 2024\n• Planned and managed the events that occurred in Omantel.\n• Communicated and negotiated with business owners/managers to ensure the delivery of high-quality products and\nservices at optimal cost.\n• Maintained comprehensive logistics for events, ensuring seamless coordination and execution.\n• Collaborated with cross-functional teams such as Internal Communications, HQ Experience, and Training to meet the\nKPIs of the events.\nProjects\nThe National System for Planning, Monitoring, and Evaluation July 2024 – April 2025\nGeneral Secretariat of the Council of Ministers\n• Conducted comprehensive data analysis and cleansing to enhance data integrity.\n• Managed and organized data, ensuring accessibility for the team.\n• Executed efficient conversion of files to required formats to maintain data usability.\n• Liaised with governmental institutions to facilitate seamless database communication and collaboration.\n• Evaluated adherence of governmental institutions to required templates, ensuring standards were met.\nPower BI Dashboard for Waggle Startup, Udacity Dec 2024\n• Implemented a Power BI report for Waggle, detailed comparisons between Lapcat prototypes and successful Lapdog\ndevices to assess product viability.\n• Analyzed data from 1,000 prototypes and then visualized findings to answer executive queries.\n• Ensured all visuals and graphics in the Power BI report conformed to Waggle's brand guidelines, incorporating official\ncolor palettes and logos, leading to a cohesive presentation.\n• Enhanced report usability with interactive slicers and demographic visuals, enabling executives and product teams to\ncustomize and explore data insights independently.\nPneumonia Detection Model Using AWS Rekognition AutoML, Udacity Oct 2024\n• Designed and evaluated multiple machine learning models using AWS Rekognition AutoML to classify pediatric chest\nX-rays for pneumonia detection.\n• Built four model variants to analyze the impact of data quality, class imbalance, and label accuracy on model\nperformance.\n• Assessed model efficacy using key performance metrics including precision, recall, and confusion matrices across\nbinary and multiclass classifications.\n• Conducted experiments with balanced vs. unbalanced datasets and simulated label noise to evaluate real-world data\nchallenges.\nCertifications\nBusiness Intelligence and Data Analyst (BIDA), CFI July 2025\nData Analysis and Visualization with Power BI, Udacity Apr 2025\nAI Product Manager, Udacity Nov 2024\nCertified Scrum Product Owner (CSPO), Scrum Alliance Jul 2024\nCertified ScrumMaster (CSM), Scrum Alliance Jun 2024\n"


## 6) Display the Full DataFrame (scrollable)

In [29]:

pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", None)

html_table = df.to_html(escape=False, index=False)
display(HTML(f'''
<div style="max-height: 600px; overflow: auto; border: 1px solid #ccc; padding: 10px; white-space: nowrap;">
{html_table}
</div>
'''))


file_name,name,email,phone,linkedin,github,portfolio,other_links,raw_text
Duaa Hilal Al-Hashmi CV 2025.pdf,Duaa Hilal Al-hashmi,Duaahil44@gmail.com,+968 91799668,,https://github.com/Duaa3/Hotel-Managements-Project-Frontend,,https://github.com/Duaa3/Hotel-Managements-Project-Backend-; https://www.researchgate.net/publication/369452982_Date_fruit_classification_and_sorting_system_using_Artificial_; https://ijrar.org/viewfulltext.php?&p_id=IJRAR21D2332,"Duaa Hilal Al-Hashmi\nAl Batinah North | P: +968 91799668 | Duaahil44@gmail.com\nEDUCATION\nCODELINE\nData Science\nGraduation Date: October 2025\nSAUDI DIGITAL ACADEMY\nFullStack Java Web Development Bootcamp\nGraduation Date: December 2023\nUNIVERSITY OF TECHNOLOGY AND APPLIED SCIENCES\nBachelor of Electronics and Telecommunications Engineering\nCumulative GPA: 3.2/4.0, Graduation Date: August 2022\nEXPERIENCE\nTelecommunications Regulatory Authority.\nMuscat —Internship\n● Completed the Nafath Training Program, acquiring expertise in programming languages (Java, JavaScript, Python, SQL),\nframeworks (Spring Boot, Swagger-UI, Angular), and software design principles (RESTful APIs, SOLID)\n● Applied skills in debugging tools (Postman, IntelliJ Debugger), IDEs (IntelliJ IDEA, Visual Studio Code), and comprehensive\nconcepts (authentication, network protocols, web security) within a structured 3-tier application framework\n● Gained experience in version control systems such as Git and GitHub for collaborative code management, adeptly employing\nAgile methodologies like Scrum, Trello, and Slack for efficient project organization and communication\nUNIVERSITY OF TECHNOLOGY AND APPLIED SCIENCES\nAl Musanna— Research Assistant\nNovember 2021- August 2022\n● Drove the design and development of an IoT-based system for post-harvest data collection in Oman\n● Implemented image processing and AI techniques; successful collaboration with a diverse team of researchers improved the\nsystem, contributing to socioeconomic development\n● Played an integral role in co-authoring and publishing two highly impactful research papers\nSAID BIN SULTAN NAVAL BASE, ROYAL NAVY OF OMAN\nAl Musanna —Internship\nJune 2022- July 2022\n● Engaged in a comprehensive 40-day internship at Said Bin Sultan Naval Base, providing input and contributing to\nworkshops aimed at enhancing health and safety practices\n● Optimized radar systems, improving intercom efficiency, advancing ESM and ECM capabilities\n● Streamlined external communication protocols, enhanced network, and software performance implemented PCB\nenhancements, refined naval calibration procedures, and drove innovation in underwater systems\nPROJECTS\nSMART OFFICE PLANNER\nJuly 2025\n● Hybrid‑work optimization prototype: rule‑based + ML scheduling, seat‑map visualization, capacity limits, preference\nmatching.\n● Built with Python, Streamlit UI, FastAPI services, and Supabase/Postgres; included simple auth and CRUD.\nHOTEL MANAGEMENT SYSTEM\nNovember 2023\n● Engineered and deployed a robust RESTful API for a Hotel Management System, elevating data retrieval and processing\nefficiency by 50% for enhanced user experience and system performance\n● Enabled seamless hotel and room creation, guest registration, reservation booking, and staff management, fostering a 30%\nimprovement in operational efficiency\n● Leveraged Java, Spring Boot, Spring Security, and MySQL for system development, resulting in a 20% decrease in feature\ndevelopment time and quicker deployment\n● GitHub Repository For the Frontend: Link: https://github.com/Duaa3/Hotel-Managements-Project-Frontend\n● GitHub Repository For the Backend: Link: https://github.com/Duaa3/Hotel-Managements-Project-Backend-\nDEEP LEARNING BASED DATE FRUIT CLASSIFICATION SYSTEM\nApril 2022\n● Collaborated with a team of 3 students to develop a date classification method using teachable machine and MIT App for\nfruit identification\n● Implemented JavaScript and deep learning techniques to analyze data from Saudi Arabian and Omani farms, achieving 95%\naccuracy; enabled data-driven decision-making for optimizing agricultural productivity and resource allocation\nPUBLICATION\nRESUME WORDED FINANCE SOCIETY\nDecember 2022\n● [Link:https://www.researchgate.net/publication/369452982_Date_fruit_classification_and_sorting_system_using_Artificial_\nIntelligence_ Application_of_Transfer_Learning]\nMULTICLASS FRUIT CLASSIFICATION MODEL USING DEEP LEARNING\nDecember 2021\n● [Link: https://ijrar.org/viewfulltext.php?&p_id=IJRAR21D2332]\nADDITIONAL\nCertifications & Training:\nAI Empowered Youth Program — Omantel 2025\nIAIDL Essentials — International AI Driving License (IDEL), 2025\nHuawei HCIA‑AI Training (Oman), 2025\nHuawei Advanced Training: (China) AI Internship Professional Practice 2025\nHCIA — Cloud Computing (2025)\nAwards:\nFirst Place — CODELINE Hackathon 2025 (Smart Office Planner)\nHonor Student, University of Technology & Applied Sciences\nFirst Prize — Seventh Engineering Week (May 2022), UTAS Al Musanna\nBEST PAPER — WCASET 2022 (IFERP)\nTechnical Skills\n● Programming & Data Science: Python, NumPy, Pandas, scikit-learn, XGBoost, TensorFlow/Keras (foundations),\nBayesian networks (pgmpy), statistical modeling, class imbalance handling.\n● Machine Learning & Analytics: Exploratory data analysis (EDA), data cleaning, feature engineering, performance\nevaluation (Accuracy, F1-score, ROC-AUC, RMSE, MAE, R²).\n● Computer Vision: OpenCV, image preprocessing and enhancement (histogram equalization, contrast adjustment).\n● Backend & APIs: FastAPI, Uvicorn, RESTful design, authentication, Supabase (Postgres, storage).\n● Web Development: Java, Spring Boot, Angular, Bootstrap; full-stack application development with CI practices.\n● Databases & Data Tools: SQL (PostgreSQL, MySQL), Supabase, web scraping (BeautifulSoup, Selenium).\n● DevOps & Infrastructure: Docker (containerization), Linux CLI, Git/GitHub version control, basic CI/CD pipelines.\n● Visualization & Reporting: Matplotlib, Plotly, Jupyter Notebooks, Markdown documentation.\n● Languages: Fluent in Arabic & English.\n"
Nusiba_Alnabhani_cv (1).pdf,Bachelor’s In Information Security,nusibaalnabhani@gmail.com,+968-98995282,https://www.linkedin.com/in/com/in/nusiba-alnabhani,,,,"N U S I B A M U S L I M J U M M A A L N A B H A N I\nUniversity of Technology and Applied Sciences, 2019-2023\nBachelor’s in information security\nMuscat, Oman | +968-98995282 | nusibaalnabhani@gmail.com |linkedin.com/in/nusiba-alnabhani\nPROFILE\nInformation Security graduate with hands on training in cybersecurity, data analysis, and artificial intelligence. Experienced\nin applying technical skills to real world projects, including data preprocessing, network setup, and customer protection\nservices. Eager to contribute to organizational growth by leveraging technical expertise and practical experience.\nEXPERIENCES\nOMANTAL - CODELINE\nJanuary, 2025 (present)\nConducted exploratory data analysis (EDA) on datasets with 50k+ entries using Python (Pandas, NumPy, Matplotlib).\nApplied feature engineering techniques to improve ML model performance by 20%.\nTrained and evaluated machine learning models using scikit-learn, achieving consistent accuracy improvements.\nGained hands-on experience in SQL for data manipulation and automation tasks.\nEXCEED IT SERVICES AND TRAINING COMPANY\nMarch, 2024 - August, 2024\nCompleted 6 intensive courses in Artificial Intelligence, Python, NLP, and Computer Vision.\nBuilt and tested machine learning models on sample datasets, achieving 85% accuracy in classification tasks.\nApplied Azure ML for model deployment and gained practical experience with real-world AI workflows.\nTHE PUBLIC AUTHORITY FOR CUSTOMER PROTECTION\nNovember,2023 - December, 2023\nDrafted and processed 100+ memos, letters, and official communications.\nRegistered and tracked 200+ incoming and outgoing documents using archiving systems.\nDesigned a document classification and retention system to improve record accessibility by 30%.\nInvestigated and reported on customer complaints, contributing to 10+ resolved cases.\nCreated press releases and promotional posters using Canva, enhancing media visibility.\nMINISTY OF INFORMATION\nJun, 2022 - Jul, 2022\nSupported IT team in activating Windows licenses and troubleshooting systems.\nInstalled antivirus software across 50+ devices, improving endpoint security.\nConnected client and server machines to secure networks, reducing downtime by 15%.\nPracticed software development concepts using Python, Java, and Flutter.\nCERTIFICATIONS\nInternational AI Driving License Certificate, 2025\nMicrosoft Azure AI Fundamentals (AI-900), 2024\nScrum Fundamentals Certified, 2024\nPhotoshop from Scratch Workshop, 2023\nInternet of Things and its Applications, 2022\nCyber Security Certified, 2022\nTECHNICAL SKILLS PERSONAL SKILLS\nMicrosoft Office Effective communication with cross-functional\nCisco Packet Tracer teams.\nLinux (Kali) Report writing and documentation.\nProgramming Languages: Python, Java Adaptability in fast-paced technical\nenvironments.\nData-driven decision making.\nACHIEVEMENTS\n“Photoshop from scratch” workshop 2023\nParticipation in the membership of the Student Advisory Council.\nParticipate in the activities of the ""Ghaith"" volunteer team.\nParticipation in the youth leadership camp in October. 2022\nParticipated in the fifth debate competition organized by the university in October.\n""Leaders"" training program organized by the university branch September.\nParticipation in the Rahal Hackathon for tourism projects, which was implemented by the ""Ghobsha"" team in August.\nParticipated in the fourth debate club forum in March.\n""3D printer"" workshop in February.\n"
Janna Khalid Al Balushi Resume 14-09-25.pdf,Janna Khalid Al Balushi,janna.balushi@outlook.com,+968 79141663,https://www.linkedin.com/in/Profile,https://github.com/Profile,,,"Janna Khalid Al Balushi\njanna.balushi@outlook.com | +968 79141663 | Oman, Muscat | LinkedIn Profile | GitHub Profile\nProfessional Summary\nAspiring Data Scientist with hands-on experience as a Data Analyst on a national project. Passionate about bridging the\ngap between business needs and technological solutions. My commitment to continuously learning drives me to constantly\nhone my skills in agile methodologies, data science and AI/ML, empowering me to harness cutting-edge solutions that\ndrive impactful business outcomes\nEducational Background\nBachelor of Technology, Electronics & Telecommunications Engineering 2018 – 2024\nUniversity of Technology & Applied Sciences (UTAS) Muscat, Oman\nGPA: 3.13/4\nCapstone Project: School Bus Security System\n• Led the team towards achieving projects’ objectives.\n• Implemented and troubleshooted an IoT-based circuit.\n• Participated in the Traffic Awareness Week Event organized by the Oman Royal Army.\nProfessional Experience\nAI Empowered Youth Associate, Omantel Jan 2025 – Present\n• Assigned as Scrum Master leading the team in agile scrum practices, facilitating meetings, and enhancing team\ncollaboration.\n• Pursuing a comprehensive understanding and application of data science concepts.\n• Gaining hands-on experience in programming with Python and SQL for data manipulation.\n• Familiar with the lifecycle of data science, from data collection to model evaluation and deployment.\n• Conducted data preprocessing and exploratory data analysis (EDA) using Python libraries (Pandas, NumPy, Matplotlib,\nSeaborn) to clean, transform, and visualize datasets.\n• Applied feature engineering techniques such as encoding, scaling, and interaction term creation to optimize model\ninputs and improve performance.\n• Trained and evaluated machine learning models using scikit-learn.\nData Leaders Associate, Omantel Academy Jun 2024 – Dec 2024\n• Proven proficiency in using MS Excel for data analysis.\n• Developed expertise in data cleansing, data analysis, and data Visualization.\n• Contributed to The National System for Planning, Monitoring, and Evaluation project in collaboration with General\nSecretariat of the Council of Minsters and Omantel.\nPeople Operations Trainee, Omantel Mar 2024 – May 2024\n• Planned and managed the events that occurred in Omantel.\n• Communicated and negotiated with business owners/managers to ensure the delivery of high-quality products and\nservices at optimal cost.\n• Maintained comprehensive logistics for events, ensuring seamless coordination and execution.\n• Collaborated with cross-functional teams such as Internal Communications, HQ Experience, and Training to meet the\nKPIs of the events.\nProjects\nThe National System for Planning, Monitoring, and Evaluation July 2024 – April 2025\nGeneral Secretariat of the Council of Ministers\n• Conducted comprehensive data analysis and cleansing to enhance data integrity.\n• Managed and organized data, ensuring accessibility for the team.\n• Executed efficient conversion of files to required formats to maintain data usability.\n• Liaised with governmental institutions to facilitate seamless database communication and collaboration.\n• Evaluated adherence of governmental institutions to required templates, ensuring standards were met.\nPower BI Dashboard for Waggle Startup, Udacity Dec 2024\n• Implemented a Power BI report for Waggle, detailed comparisons between Lapcat prototypes and successful Lapdog\ndevices to assess product viability.\n• Analyzed data from 1,000 prototypes and then visualized findings to answer executive queries.\n• Ensured all visuals and graphics in the Power BI report conformed to Waggle's brand guidelines, incorporating official\ncolor palettes and logos, leading to a cohesive presentation.\n• Enhanced report usability with interactive slicers and demographic visuals, enabling executives and product teams to\ncustomize and explore data insights independently.\nPneumonia Detection Model Using AWS Rekognition AutoML, Udacity Oct 2024\n• Designed and evaluated multiple machine learning models using AWS Rekognition AutoML to classify pediatric chest\nX-rays for pneumonia detection.\n• Built four model variants to analyze the impact of data quality, class imbalance, and label accuracy on model\nperformance.\n• Assessed model efficacy using key performance metrics including precision, recall, and confusion matrices across\nbinary and multiclass classifications.\n• Conducted experiments with balanced vs. unbalanced datasets and simulated label noise to evaluate real-world data\nchallenges.\nCertifications\nBusiness Intelligence and Data Analyst (BIDA), CFI July 2025\nData Analysis and Visualization with Power BI, Udacity Apr 2025\nAI Product Manager, Udacity Nov 2024\nCertified Scrum Product Owner (CSPO), Scrum Alliance Jul 2024\nCertified ScrumMaster (CSM), Scrum Alliance Jun 2024\n"


## 7) Save to CSV

In [30]:

df.to_csv(OUTPUT_CSV, index=False, encoding="utf-8")
print(f"✅ Data saved to: {OUTPUT_CSV}")


✅ Data saved to: C:\Users\bbuser\Downloads\cv_data_extracted_final.csv
