# Choose a Data Set

You can choose to analyze any data that you would like! Remember, you need 1000 rows of non-null data in order to get 5 points for the "Data" criteria of my [rubric](https://docs.google.com/document/d/1s3wllcF3LLnytxwD8mZ-BCypXKnfaahnizWGNojT-B4/edit?usp=sharing). Consider looking at [Kaggle](https://www.kaggle.com/datasets) or [free APIs](https://free-apis.github.io/#/browse) for datasets of this size. Alternatively, you can scrape the web to make your own dataset! :D

Once you have chosen your dataset, please read your data into a dataframe and call `.info()` below. If you don't call `info` I will give you 0 points for the first criteria described on the [rubric](https://docs.google.com/document/d/1s3wllcF3LLnytxwD8mZ-BCypXKnfaahnizWGNojT-B4/edit?usp=sharing).

In [15]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from urllib.parse import urljoin
import re
from collections import Counter

def fetch_links(url):
    """
    Fetch all article and sub-index links from a Wikipedia page.
    """
    response = requests.get(url)
    if response.status_code != 200:
        print(f"Failed to fetch page: {url}")
        return []
    
    soup = BeautifulSoup(response.text, 'html.parser')
    links = []
    
    # Extract all relevant links from the page
    for a_tag in soup.find_all('a', href=True):
        href = a_tag['href']
        if href.startswith('/wiki/') and ':' not in href:  # Ignore special pages like "Category:"
            full_url = urljoin("https://en.wikipedia.org", href)
            links.append(full_url)
    
    return list(set(links))  # Remove duplicates

def scrape_sources(url):
    """
    Scrape the sources (references) from a Wikipedia article.
    """
    response = requests.get(url)
    if response.status_code != 200:
        print(f"Failed to fetch article: {url}")
        return []
    
    soup = BeautifulSoup(response.text, 'html.parser')
    references = soup.find_all('ol', class_='references')
    sources = []
    
    for ref_list in references:
        for ref in ref_list.find_all('li'):
            link = ref.find('a', href=True)
            if link and 'http' in link['href']:
                sources.append(link['href'])
    
    return sources

def main():
    science_index_url = "https://en.wikipedia.org/wiki/Category:Indexes_of_science_articles"
    source_limit = 1000  # Limit for total sources
    
    # Step 1: Fetch all links from the science index
    print("Fetching main index...")
    all_links = fetch_links(science_index_url)
    
    # Step 2: Collect all unique article links from sub-indexes
    article_links = set()
    print("Fetching sub-indexes...")
    for link in all_links:
        if len(article_links) >= source_limit:  # Stop early if source limit reached
            break
        print(f"Fetching links from: {link}")
        sub_links = fetch_links(link)
        article_links.update(sub_links)
    
    print(f"Total articles to scrape: {len(article_links)}")
    
    # Step 3: Scrape sources from all articles
    domain_counter = Counter()
    total_sources = 0
    
    for idx, article in enumerate(article_links):
        if total_sources >= source_limit:  # Stop if source limit reached
            break
        print(f"Scraping article {idx + 1}/{len(article_links)}: {article}")
        sources = scrape_sources(article)
        domains = [re.search(r'https?://([^/]+)/', src).group(1) for src in sources if re.search(r'https?://([^/]+)/', src)]
        domain_counter.update(domains)
        total_sources += len(sources)
    
    # Step 4: Save and analyze results
    print("\nTop 10 referenced domains:")
    for domain, count in domain_counter.most_common(10):
        print(f"{domain}: {count}")
    

if __name__ == "__main__":
    main()

Fetching main index...
Fetching sub-indexes...
Fetching links from: https://en.wikipedia.org/wiki/Index_of_optics_articles
Fetching links from: https://en.wikipedia.org/wiki/Index_of_oral_health_and_dental_articles
Total articles to scrape: 1212
Scraping article 1/1212: https://en.wikipedia.org/wiki/Radial_polarisation
Scraping article 2/1212: https://en.wikipedia.org/wiki/Dental_public_health
Scraping article 3/1212: https://en.wikipedia.org/wiki/Lux
Scraping article 4/1212: https://en.wikipedia.org/wiki/Interferometry
Scraping article 5/1212: https://en.wikipedia.org/wiki/Colorimetry
Scraping article 6/1212: https://en.wikipedia.org/wiki/Hertwig%27s_epithelial_root_sheath
Scraping article 7/1212: https://en.wikipedia.org/wiki/Optical_processor
Scraping article 8/1212: https://en.wikipedia.org/wiki/Dental_phobia
Scraping article 9/1212: https://en.wikipedia.org/wiki/Mumps
Scraping article 10/1212: https://en.wikipedia.org/wiki/Angle_of_incidence_(optics)
Scraping article 11/1212: http

Scraping article 108/1212: https://en.wikipedia.org/wiki/Integrated_optics
Scraping article 109/1212: https://en.wikipedia.org/wiki/Polaroid_(polarizer)
Scraping article 110/1212: https://en.wikipedia.org/wiki/FDI_World_Dental_Federation_notation
Scraping article 111/1212: https://en.wikipedia.org/wiki/Hexetidine
Scraping article 112/1212: https://en.wikipedia.org/wiki/Anodontia
Scraping article 113/1212: https://en.wikipedia.org/wiki/Ultraviolet_catastrophe
Scraping article 114/1212: https://en.wikipedia.org/wiki/Dental_floss
Scraping article 115/1212: https://en.wikipedia.org/wiki/Marian_Spore_Bush
Scraping article 116/1212: https://en.wikipedia.org/wiki/General_Dental_Council
Scraping article 117/1212: https://en.wikipedia.org/wiki/Kerr-lens_modelocking
Scraping article 118/1212: https://en.wikipedia.org/wiki/Optical_window
Scraping article 119/1212: https://en.wikipedia.org/wiki/Simon_Hullihen
Scraping article 120/1212: https://en.wikipedia.org/wiki/Cementoblastoma
Scraping article

Scraping article 216/1212: https://en.wikipedia.org/wiki/SoftDent
Scraping article 217/1212: https://en.wikipedia.org/wiki/Gaussian_beam
Scraping article 218/1212: https://en.wikipedia.org/wiki/Raman_Bedi
Scraping article 219/1212: https://en.wikipedia.org/wiki/Palatine_uvula
Scraping article 220/1212: https://en.wikipedia.org/wiki/Dental_Laboratories_Association
Scraping article 221/1212: https://en.wikipedia.org/wiki/Odontode
Scraping article 222/1212: https://en.wikipedia.org/wiki/Interrod_enamel
Scraping article 223/1212: https://en.wikipedia.org/wiki/Idiopathic_osteosclerosis
Scraping article 224/1212: https://en.wikipedia.org/wiki/Northwestern_University_Dental_School
Scraping article 225/1212: https://en.wikipedia.org/wiki/Optical_bench
Scraping article 226/1212: https://en.wikipedia.org/wiki/Zodiacal_light
Scraping article 227/1212: https://en.wikipedia.org/wiki/Dental_syringe
Scraping article 228/1212: https://en.wikipedia.org/wiki/Kalodont
Scraping article 229/1212: https://e

Scraping article 325/1212: https://en.wikipedia.org/wiki/Odontogenic_myxoma
Scraping article 326/1212: https://en.wikipedia.org/wiki/Information_theory
Scraping article 327/1212: https://en.wikipedia.org/wiki/Acousto-optics
Scraping article 328/1212: https://en.wikipedia.org/wiki/Concave_lens
Scraping article 329/1212: https://en.wikipedia.org/wiki/Reflection_(optics)
Scraping article 330/1212: https://en.wikipedia.org/wiki/Chewable_toothbrush
Scraping article 331/1212: https://en.wikipedia.org/wiki/Fiberotomy
Scraping article 332/1212: https://en.wikipedia.org/wiki/Fourier_optics
Scraping article 333/1212: https://en.wikipedia.org/wiki/Speckle_interferometry
Scraping article 334/1212: https://en.wikipedia.org/wiki/Maury_Massler
Scraping article 335/1212: https://en.wikipedia.org/wiki/Aquafresh
Scraping article 336/1212: https://en.wikipedia.org/wiki/Brewster%27s_angle
Scraping article 337/1212: https://en.wikipedia.org/wiki/Hyperdontia
Scraping article 338/1212: https://en.wikipedia.o

Scraping article 433/1212: https://en.wikipedia.org/wiki/Evanescent_wave
Scraping article 434/1212: https://en.wikipedia.org/wiki/Total_internal_reflection
Scraping article 435/1212: https://en.wikipedia.org/wiki/Journal_of_Periodontology
Scraping article 436/1212: https://en.wikipedia.org/wiki/Cementoblast
Scraping article 437/1212: https://en.wikipedia.org/wiki/Dental_avulsion
Scraping article 438/1212: https://en.wikipedia.org/wiki/Neonatal_teeth
Scraping article 439/1212: https://en.wikipedia.org/wiki/Nonimaging_optics
Scraping article 440/1212: https://en.wikipedia.org/wiki/Scope_(mouthwash)
Scraping article 441/1212: https://en.wikipedia.org/wiki/Refracting_telescope
Scraping article 442/1212: https://en.wikipedia.org/wiki/University_of_Tennessee_College_of_Dentistry
Scraping article 443/1212: https://en.wikipedia.org/wiki/Amosan
Scraping article 444/1212: https://en.wikipedia.org/wiki/Root_canal
Scraping article 445/1212: https://en.wikipedia.org/wiki/Interdental_brush
Scraping 

Scraping article 541/1212: https://en.wikipedia.org/wiki/Teledentistry
Scraping article 542/1212: https://en.wikipedia.org/wiki/Photoresistor
Scraping article 543/1212: https://en.wikipedia.org/wiki/Mastication
Scraping article 544/1212: https://en.wikipedia.org/wiki/Toothbrush
Scraping article 545/1212: https://en.wikipedia.org/wiki/Erythroplakia
Scraping article 546/1212: https://en.wikipedia.org/wiki/Apex_locator
Scraping article 547/1212: https://en.wikipedia.org/wiki/Dental_alveolus
Scraping article 548/1212: https://en.wikipedia.org/wiki/Lucy_Hobbs_Taylor
Scraping article 549/1212: https://en.wikipedia.org/wiki/Stellate_reticulum
Scraping article 550/1212: https://en.wikipedia.org/wiki/Dappen_glass
Scraping article 551/1212: https://en.wikipedia.org/wiki/Ameloblastoma
Scraping article 552/1212: https://en.wikipedia.org/wiki/Stereoscopy
Scraping article 553/1212: https://en.wikipedia.org/wiki/Cheilitis
Scraping article 554/1212: https://en.wikipedia.org/wiki/Tufts_University_Schoo

Scraping article 652/1212: https://en.wikipedia.org/wiki/Superior_mouth
Scraping article 653/1212: https://en.wikipedia.org/wiki/Alpenglow
Scraping article 654/1212: https://en.wikipedia.org/wiki/CEREC
Scraping article 655/1212: https://en.wikipedia.org/wiki/Frank_Abbott_(dentist)
Scraping article 656/1212: https://en.wikipedia.org/wiki/Posterior_superior_alveolar_artery
Scraping article 657/1212: https://en.wikipedia.org/wiki/Laser_pumping
Scraping article 658/1212: https://en.wikipedia.org/wiki/Dental_care_of_Guantanamo_Bay_detainees
Scraping article 659/1212: https://en.wikipedia.org/wiki/Gardner%27s_syndrome
Scraping article 660/1212: https://en.wikipedia.org/wiki/Optical_properties
Scraping article 661/1212: https://en.wikipedia.org/wiki/F-number
Scraping article 662/1212: https://en.wikipedia.org/wiki/Photon_polarization
Scraping article 663/1212: https://en.wikipedia.org/wiki/Polymorphous_low-grade_adenocarcinoma
Scraping article 664/1212: https://en.wikipedia.org/wiki/Optical_e

Scraping article 759/1212: https://en.wikipedia.org/wiki/Atom_optics
Scraping article 760/1212: https://en.wikipedia.org/wiki/Infrared
Scraping article 761/1212: https://en.wikipedia.org/wiki/Treatment_of_knocked-out_(avulsed)_teeth
Scraping article 762/1212: https://en.wikipedia.org/wiki/Dental_trauma
Scraping article 763/1212: https://en.wikipedia.org/wiki/Dental_follicle
Scraping article 764/1212: https://en.wikipedia.org/wiki/Dental_radiography
Scraping article 765/1212: https://en.wikipedia.org/wiki/Scheimpflug_principle
Scraping article 766/1212: https://en.wikipedia.org/wiki/Umbra
Scraping article 767/1212: https://en.wikipedia.org/wiki/Chapin_A._Harris
Scraping article 768/1212: https://en.wikipedia.org/wiki/Transparency_(optics)
Scraping article 769/1212: https://en.wikipedia.org/wiki/Georg_Carabelli
Scraping article 770/1212: https://en.wikipedia.org/wiki/Gingivitis
Scraping article 771/1212: https://en.wikipedia.org/wiki/Traumatic_neuroma
Scraping article 772/1212: https://e

Scraping article 867/1212: https://en.wikipedia.org/wiki/Tongue_cleaner
Scraping article 868/1212: https://en.wikipedia.org/wiki/Mirror
Scraping article 869/1212: https://en.wikipedia.org/wiki/Mammelon
Scraping article 870/1212: https://en.wikipedia.org/wiki/Orville_Howard_Phillips
Scraping article 871/1212: https://en.wikipedia.org/wiki/Modelocking
Scraping article 872/1212: https://en.wikipedia.org/wiki/Radioactive_dentin_abrasion
Scraping article 873/1212: https://en.wikipedia.org/wiki/Temporomandibular_joint_disorder
Scraping article 874/1212: https://en.wikipedia.org/wiki/Odontogenic_cyst
Scraping article 875/1212: https://en.wikipedia.org/wiki/Lip_piercing
Scraping article 876/1212: https://en.wikipedia.org/wiki/Radius_of_curvature_(optics)
Scraping article 877/1212: https://en.wikipedia.org/wiki/Shovel-shaped_incisors
Scraping article 878/1212: https://en.wikipedia.org/wiki/Pattern_recognition
Scraping article 879/1212: https://en.wikipedia.org/wiki/Metastatic_tumor_of_jaws
Scra

Scraping article 977/1212: https://en.wikipedia.org/wiki/David_J._Acer
Scraping article 978/1212: https://en.wikipedia.org/wiki/Mandibular_central_incisor
Scraping article 979/1212: https://en.wikipedia.org/wiki/Atomic,_molecular,_and_optical_physics
Scraping article 980/1212: https://en.wikipedia.org/wiki/Optical_theorem
Scraping article 981/1212: https://en.wikipedia.org/wiki/Martin_van_Butchell
Scraping article 982/1212: https://en.wikipedia.org/wiki/Cathodoluminescence
Scraping article 983/1212: https://en.wikipedia.org/wiki/Euthymol
Scraping article 984/1212: https://en.wikipedia.org/wiki/Edward_Maynard
Scraping article 985/1212: https://en.wikipedia.org/wiki/Charles_Goodall_Lee
Scraping article 986/1212: https://en.wikipedia.org/wiki/Kolynos
Scraping article 987/1212: https://en.wikipedia.org/wiki/Painless_Parker
Scraping article 988/1212: https://en.wikipedia.org/wiki/Alveolar_process_of_maxilla
Scraping article 989/1212: https://en.wikipedia.org/wiki/Gold_teeth
Scraping article

Scraping article 1087/1212: https://en.wikipedia.org/wiki/Dental_notation
Scraping article 1088/1212: https://en.wikipedia.org/wiki/Reflection_coefficient
Scraping article 1089/1212: https://en.wikipedia.org/wiki/Norman_Simmons
Scraping article 1090/1212: https://en.wikipedia.org/wiki/Dental_canaliculi
Scraping article 1091/1212: https://en.wikipedia.org/wiki/Oral_hygiene
Scraping article 1092/1212: https://en.wikipedia.org/wiki/Dental_composite
Scraping article 1093/1212: https://en.wikipedia.org/wiki/Pedodontics
Scraping article 1094/1212: https://en.wikipedia.org/wiki/Caries_vaccine
Scraping article 1095/1212: https://en.wikipedia.org/wiki/Tongue
Scraping article 1096/1212: https://en.wikipedia.org/wiki/Lasing_threshold
Scraping article 1097/1212: https://en.wikipedia.org/wiki/Dentrix
Scraping article 1098/1212: https://en.wikipedia.org/wiki/MFDS
Scraping article 1099/1212: https://en.wikipedia.org/wiki/Tuftelin
Scraping article 1100/1212: https://en.wikipedia.org/wiki/Enamel_organ


Scraping article 1195/1212: https://en.wikipedia.org/wiki/Prism_compressor
Scraping article 1196/1212: https://en.wikipedia.org/wiki/Crown_(tooth)
Scraping article 1197/1212: https://en.wikipedia.org/wiki/International_Association_for_Dental_Research
Scraping article 1198/1212: https://en.wikipedia.org/wiki/Royal_College_of_Dentists
Scraping article 1199/1212: https://en.wikipedia.org/wiki/Refraction
Scraping article 1200/1212: https://en.wikipedia.org/wiki/Whitening_strips
Scraping article 1201/1212: https://en.wikipedia.org/wiki/Benign_lymphoepithelial_lesion
Scraping article 1202/1212: https://en.wikipedia.org/wiki/Dental_auxiliary
Scraping article 1203/1212: https://en.wikipedia.org/wiki/Free-space_optical_communication
Scraping article 1204/1212: https://en.wikipedia.org/wiki/Circle_of_confusion
Scraping article 1205/1212: https://en.wikipedia.org/wiki/Tooth_painting
Scraping article 1206/1212: https://en.wikipedia.org/wiki/History_of_dental_treatments
Scraping article 1207/1212: 

# My Question

### Which sources are most frequently cited across Wikipedia articles in a specific domain, and what does this tell us about the reliability and diversity of the information on Wikipedia?

# My Analysis

In [5]:
# Analyze here

# My Answer

### Write your answer here.