## Scrape Wikipedia

#### 1. To start scraping useful information from Wikipedia, we need to first install the wikipedia package. Here's how you can do it in Python

In [2]:
pip install wikipedia

Note: you may need to restart the kernel to use updated packages.


### 2. Importing Wikipedia

In [14]:
import wikipedia

Once you have imported the package, you can use its functions to retrieve information from Wikipedia pages.

### 3. Defining the Search Query

We define a variable search_query and assign it the value "Python". This is the term we want to search for on Wikipedia.

In [16]:
search_query = "Data Analysis"
search_results = wikipedia.search(search_query)
print(search_results)

['Data analysis', 'Exploratory data analysis', 'Big data', 'Topological data analysis', 'Multivariate statistics', 'Functional data analysis', 'Forensic data analysis', 'Data', 'Qualitative research', 'Distributional data analysis']


In [17]:
summary = wikipedia.summary("Data Analysis")
print(summary)

Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.
Data mining is a particular data analysis technique that focuses on statistical modeling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new featur

### 4. To retrieve the summary of a Wikipedia page in a language other than English

you can specify the language using the lang parameter in the summary() method. 

In [22]:
wikipedia.set_lang("fr")

# Retrieve the summary for "Data Analysis" in French

summary_french = wikipedia.summary("Analyse de données")
print(summary_french)

L’analyse des données (aussi appelée analyse exploratoire des données ou AED) est une famille de méthodes statistiques dont les principales caractéristiques sont d'être multidimensionnelles et descriptives. Dans l'acception française, la terminologie « analyse des données » désigne donc un sous-ensemble de ce qui est appelé plus généralement la statistique multivariée. Certaines méthodes, pour la plupart géométriques, aident à faire ressortir les relations pouvant exister entre les différentes données et à en tirer une information statistique qui permet de décrire de façon plus succincte les principales informations contenues dans ces données. D'autres techniques permettent de regrouper les données de façon à faire apparaître clairement ce qui les rend homogènes, et ainsi mieux les connaître.
L’analyse des données permet de traiter un nombre très important de données et de dégager les aspects les plus intéressants de la structure de celles-ci. Le succès de cette discipline dans les der

In [23]:
# Set the language to Arabic
wikipedia.set_lang("ar")

# Retrieve the summary for "Data Analysis" in Arabic
summary_arabic = wikipedia.summary("تحليل البيانات")
print(summary_arabic)

بيانات: مفرد بيان- بيانات / مجموعة بيانات
1 - معلومات تفصيليّة حول شخص أو شيءٍ ما يمكن من خلالها الاستدلال عليه.
2 - (الحاسبات والمعلومات) رموز عدديّة وغيرها من المعلومات الممثَّلة بشكل ملائم لمعالجتها بالحاسوب.

تحليل البيانات أو المعطيات (بالإنجليزية: Data analysis)‏: هو عملية الفحص والتدقيق للبيانات، وتمشيطها لتكون أكثر دقة، واعادة تشكيلها، وتخزينها أيضا لنحصل ونستنبط في النهاية على معلومات يمكن على اساسها اتخاذ وتحديد القرارات. ولتحليل البيانات طرق عديدة تختلف باختلاف المجال المستخدمة فيه. حيث يمكننا استخدام تحليل البيانات في العلوم والعلوم الاجتماعية والمالية أيضا.


### 5. To Get the Tittle

In [18]:
search_query = "Data Analysis"

# Retrieve the Wikipedia page object
page = wikipedia.page(search_query)

# Get the title of the Wikipedia page
title = page.title

print("Title:", title)

Title: Data analysis


### 6. To get the url of the article

In [20]:
search_query = "Data Analysis"

page = wikipedia.page(search_query)

url = page.url

print("Title:", url)

Title: https://en.wikipedia.org/wiki/Data_analysis


### 7. To get the url of the images

In [21]:
search_query = "Data Analysis"

page = wikipedia.page(search_query)

image_urls = page.images

# Print each image URL
print("Images:")
for url in image_urls:
    print(url)


Images:
https://upload.wikimedia.org/wikipedia/commons/b/ba/Data_visualization_process_v1.png
https://upload.wikimedia.org/wikipedia/commons/5/54/Rayleigh-Taylor_instability.jpg
https://upload.wikimedia.org/wikipedia/commons/e/ee/Relationship_of_data%2C_information_and_intelligence.png
https://upload.wikimedia.org/wikipedia/commons/9/9b/Social_Network_Analysis_Visualization.png
https://upload.wikimedia.org/wikipedia/commons/f/fb/Total_Revenues_and_Outlays_as_Percent_GDP_2013.png
https://upload.wikimedia.org/wikipedia/commons/7/7e/U.S._Phillips_Curve_2000_to_2013.png
https://upload.wikimedia.org/wikipedia/commons/d/db/US_Employment_Statistics_-_March_2015.png
https://upload.wikimedia.org/wikipedia/commons/8/80/User-activities.png
https://upload.wikimedia.org/wikipedia/commons/0/0b/Wikiversity_logo_2017.svg
https://upload.wikimedia.org/wikipedia/en/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg
