## Part 2)

- For the same person from step 1), use the Wikipedia API to access the whole content of that person's Wikipedia page.
- The goal of part 2) is to produce the capability to:
  1. For that Wikipedia page determine the sentiment of the entire page
  1. Print out the Wikipedia article
  1. Collect the Wikipedia pages from the 10 nearest neighbors in Step 1)
  1. Determine the nearness ranking of these 10 to your main subject based on their entire Wikipedia page
  1. Compare the nearest ranking from Step 1) with the Wikipedia page nearness ranking

In [19]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors
from textblob import TextBlob

# Install and import wikipedia library
!pip install wikipedia

[0m

In [32]:
def get_wikipedia_content(person_name):
    try:
        page = wikipedia.page(person_name)
        return page.content
    except wikipedia.exceptions.PageError:nearest_neighbors = ["Marie Curie", "Isaac Newton", "Galileo Galilei", "Stephen Hawking", "Richard Feynman", "Nikola Tesla", "Charles Darwin", "Aristotle", "Archimedes", "Leonardo da Vinci"]
    return None

In [28]:
def analyze_wikipedia_content(person_name, nearest_neighbors):
    # Fetch Wikipedia content for the main person
    main_content = get_wikipedia_content(person_name)
    
    if main_content is None:
        print(f"No Wikipedia page found for {person_name}")
        return None, None
    
    # Calculate sentiment of the entire page
    main_sentiment = TextBlob(main_content).sentiment
    
    # Print out the Wikipedia article
    print(f"Wikipedia article for {person_name}:")
    print(main_content[:500] + "...")
    
    # Collect Wikipedia pages from the nearest neighbors
    neighbor_contents = []
    for neighbor in nearest_neighbors:
        content = get_wikipedia_content(neighbor)
        if content is not None:
            neighbor_contents.append(content)
    
    if len(neighbor_contents) < 2:
        print("Not enough data to perform KNN. Skipping ranking.")
        return main_sentiment, nearest_neighbors
    
    # Determine nearness ranking based on entire Wikipedia pages
    vectorizer = TfidfVectorizer(stop_words='english')
    X = vectorizer.fit_transform([main_content] + neighbor_contents)
    nn = NearestNeighbors(n_neighbors=min(len(neighbor_contents), 5), metric='cosine')
    nn.fit(X)
    
    distances, indices = nn.kneighbors(X[0].reshape(1, -1))
    wikipedia_ranking = [nearest_neighbors[i-1] for i in indices[0][1:]]
    
    return main_sentiment, wikipedia_ranking


In [None]:
# Test the function
person_name = "Albert Einstein"  # Use a well-known person as an example
nearest_neighbors = ["Marie Curie", "Isaac Newton", "Galileo Galilei", "Stephen Hawking", "Richard Feynman", "Nikola Tesla", "Charles Darwin", "Aristotle", "Archimedes", "Leonardo da Vinci"]

main_sentiment, wikipedia_ranking = analyze_wikipedia_content(person_name, nearest_neighbors)

if main_sentiment is not None:
    print(f"\nSentiment of {person_name}'s Wikipedia page:")
    print(f"Polarity: {main_sentiment.polarity}")
    print(f"Subjectivity: {main_sentiment.subjectivity}")

    print(f"\nWikipedia ranking of nearest neighbors:")
    for i, neighbor in enumerate(wikipedia_ranking):
        print(f"{i+1}. {neighbor}")

    # Compare rankings
    original_ranking = nearest_neighbors
    print("\nComparison of rankings:")
    for i in range(len(original_ranking)):
        if original_ranking[i] in wikipedia_ranking:
            print(f"{original_ranking[i]}: Original rank {i+1}, Wikipedia rank {wikipedia_ranking.index(original_ranking[i])+1}")
        else:
            print(f"{original_ranking[i]}: Original rank {i+1}, Not found in Wikipedia ranking")
else:
    print(f"No Wikipedia page found for {person_name}. Unable to perform analysis.")

Wikipedia article for Albert Einstein:
Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical physicist who is widely held as one of the most influential scientists. Best known for developing the theory of relativity, Einstein also made important contributions to quantum mechanics. His mass–energy equivalence formula E = mc2, which arises from special relativity, has been called "the world's most famous equation". He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especial...

Sentiment of Albert Einstein's Wikipedia page:
Polarity: 0.11999025762235141
Subjectivity: 0.41687531999420424

Wikipedia ranking of nearest neighbors:
1. Marie Curie
2. Richard Feynman
3. Stephen Hawking
4. Galileo Galilei

Comparison of rankings:
Marie Curie: Original rank 1, Wikipedia rank 1
Isaac Newton: Original rank 2, Not found in Wikipedia ranking
Galileo Galilei: Original rank 3, Wikipedia rank 4
Stephen Hawking: Original rank 4, Wikipedia ran