# Парсинг данных 

**Web парсинг** -  способ считывания различных данных, расположенных на веб-страницах, для их систематизации и дальнейшего анализа.

Инструменты парсинга позволяют автоматически получать новые или обновленные данные для успешной реализации поставленных целей.

[Пример сайта](https://www.insolvencydirect.bis.gov.uk/compulsoryliquidation/piu/viewqryl.asp)  

Процедуру Web парсинга можно разбить на два этапа:

1. отправка запроса на web-сайт и загрузка исходного кода страницы;
2. извлечение содержимого web-страницы. 


# Библиотека [Beautiful Soup](<https://www.crummy.com/software/BeautifulSoup/bs4/doc/>)


## Тестовый HTML

In [None]:
test = '''
    <html>
        <head><title>Some title</title></head>
        <body>
            <div class="first_level">
                <h2 align='center'> Some text </h2>
                <h2 align='left'> Another text </h2>
            </div>
            <h2> Last <b>text</b> </h2>
        </body>
    </html>
'''

### Создание объекта ```BeautifulSoup```

In [None]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(test, 'html.parser')
soup


<html>
<head><title>Some title</title></head>
<body>
<div class="first_level">
<h2 align="center"> Some text </h2>
<h2 align="left"> Another text </h2>
</div>
<h2> Last <b>text</b> </h2>
</body>
</html>

### Базовые команды в Beautiful Soup для разбора HTML документа

In [None]:
soup.title

<title>Some title</title>

In [None]:
soup.title.name

'title'

In [None]:
soup.title.string

'Some title'

In [None]:
soup.title.parent.name

'head'

In [None]:
soup.div

<div class="first_level">
<h2 align="center"> Some text </h2>
<h2 align="left"> Another text </h2>
</div>

In [None]:
soup.div['class']

['first_level']

In [None]:
soup.div.h2

<h2 align="center"> Some text </h2>

In [None]:
soup.h2

<h2 align="center"> Some text </h2>

In [None]:
soup.find_all('h2')

[<h2 align="center"> Some text </h2>,
 <h2 align="left"> Another text </h2>,
 <h2> Last <b>text</b> </h2>]

In [None]:
type(soup), type(soup.title), type(soup.div), type(soup.find_all('h2'))

(bs4.BeautifulSoup, bs4.element.Tag, bs4.element.Tag, bs4.element.ResultSet)

#### h2 теги с атрибутом align равным center

In [None]:
soup.find_all("h2", align='center')

[<h2 align="center"> Some text </h2>]

#### h2 теги с атрибутом align

In [None]:
soup.find_all('h2', align=True)

[<h2 align="center"> Some text </h2>, <h2 align="left"> Another text </h2>]

#### Текст в тегах

In [None]:
list(soup.find_all('h2')[2].children)

[' Last ', <b>text</b>, ' ']

In [None]:
s = list(soup.find_all('h2')[2].children)[0]
s

' Last '

In [None]:
type(s)

bs4.element.NavigableString

In [None]:
print(soup.get_text())



Some title


 Some text 
 Another text 

 Last text 





## Рассмотрим реальный HTML документ
В качестве примера возьмем статью из Википедии:  
<https://en.wikipedia.org/wiki/Saint_Petersburg>

### Библиотека [requests](https://2.python-requests.org/en/latest/)

#### Отправляем GET запрос на web-сервер и загружаем HTML

In [None]:
import requests
r = requests.get('https://en.wikipedia.org/w/index.php?title=Saint_Petersburg&oldid=1006889566')
r # response object

<Response [200]>

In [None]:
r.status_code

200

In [None]:
r.content



In [None]:
r.text



### Парсим полученный HTML

In [None]:
bs_sp = BeautifulSoup(r.content, 'html.parser')

In [None]:
print(bs_sp.prettify()[:200])

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Saint Petersburg - Wikipedia
  </title>
  <script>
   document.documentElement.className="


#### Выбираем элементы страницы, расположенные на верхнем уровне 

In [None]:
bs_sp.children

<list_iterator at 0x7fba19b03210>

In [None]:
[type(item) for item in list(bs_sp.children)]

[bs4.element.Doctype, bs4.element.NavigableString, bs4.element.Tag]

In [None]:
list(bs_sp.children)[0]

'html'

In [None]:
list(bs_sp.children)[1]

'\n'

объект `Doctype` содержит информацию о типе документа

объект `NavigableString` - текст, найденный в документе (между doctype и `<html>`

объект `Tag` - тег `<html>...</html>`

In [None]:
html = list(bs_sp.children)[2]

In [None]:
len(list(html.children))

5

In [None]:
from bs4 import NavigableString
[(type(item), item if type(item) is NavigableString else item.name) for item in list(html.children)]

[(bs4.element.NavigableString, '\n'),
 (bs4.element.Tag, 'head'),
 (bs4.element.NavigableString, '\n'),
 (bs4.element.Tag, 'body'),
 (bs4.element.NavigableString, '\n')]

In [None]:
body = list(html.children)[3]

In [None]:
[(type(item), item if type(item) is NavigableString else item.name) for item in list(body.children)]

[(bs4.element.Tag, 'div'),
 (bs4.element.NavigableString, '\n'),
 (bs4.element.Tag, 'div'),
 (bs4.element.NavigableString, '\n'),
 (bs4.element.Tag, 'div'),
 (bs4.element.NavigableString, '\n'),
 (bs4.element.Tag, 'div'),
 (bs4.element.NavigableString, '\n'),
 (bs4.element.Tag, 'div'),
 (bs4.element.NavigableString, '\n'),
 (bs4.element.Tag, 'footer'),
 (bs4.element.NavigableString, '\n'),
 (bs4.element.Tag, 'script'),
 (bs4.element.NavigableString, '\n'),
 (bs4.element.Tag, 'script'),
 (bs4.element.NavigableString, '\n'),
 (bs4.element.Tag, 'script'),
 (bs4.element.NavigableString, '\n')]

In [None]:
print(body.get_text())









Saint Petersburg

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Danloud (talk | contribs) at 10:24, 15 February 2021. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.Revision as of 10:24, 15 February 2021 by Danloud (talk | contribs)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)


Jump to navigation
Jump to search
Federal city in Russia
.mw-parser-output .hatnote{font-style:italic}.mw-parser-output div.hatnote{padding-left:1.6em;margin-bottom:0.5em}.mw-parser-output .hatnote i{font-style:normal}.mw-parser-output .hatnote+link+.hatnote{margin-top:-0.5em}This article is about the city in Russia. For the city in the U.S. state of Florida, see St. Petersburg, Florida. For other uses, see Saint Petersburg (disambiguation).
"Leningrad" redirects here. For other uses, see Leningrad (disambiguation).
"Petrograd" redirects here. Not to be con

#### Извлекаем информацию из статьи

##### Основная информация

In [None]:
div_parser_otpt = body.find("div", {'class' : 'mw-parser-output'})
div_parser_otpt

<div class="mw-parser-output"><div class="shortdescription nomobile noexcerpt noprint searchaux" style="display:none">Federal city in Russia</div>
<style data-mw-deduplicate="TemplateStyles:r1033289096">.mw-parser-output .hatnote{font-style:italic}.mw-parser-output div.hatnote{padding-left:1.6em;margin-bottom:0.5em}.mw-parser-output .hatnote i{font-style:normal}.mw-parser-output .hatnote+link+.hatnote{margin-top:-0.5em}</style><div class="hatnote navigation-not-searchable" role="note">This article is about the city in Russia. For the city in the U.S. state of Florida, see <a href="/wiki/St._Petersburg,_Florida" title="St. Petersburg, Florida">St. Petersburg, Florida</a>. For other uses, see <a class="mw-disambig" href="/wiki/Saint_Petersburg_(disambiguation)" title="Saint Petersburg (disambiguation)">Saint Petersburg (disambiguation)</a>.</div>
<link href="mw-data:TemplateStyles:r1033289096" rel="mw-deduplicated-inline-style"/><div class="hatnote navigation-not-searchable" role="note">

In [None]:
type(div_parser_otpt)

bs4.element.Tag

In [None]:
div_parser_otpt.find_all('p')[1:4]

[<p><b>Saint Petersburg</b> (Russian: <span title="Russian-language text"><span lang="ru">Санкт-Петербург</span></span>, <small><a href="/wiki/Romanization_of_Russian" title="Romanization of Russian">tr.</a></small> <span title="Russian-language text"><i lang="ru-Latn">Sankt-Peterburg</i></span>, <small>IPA: </small><span class="IPA" title="Representation in the International Phonetic Alphabet (IPA)"><a href="/wiki/Help:IPA/Russian" title="Help:IPA/Russian">[ˈsankt pʲɪtʲɪrˈburk]</a></span> <span class="nowrap" style="font-size:85%">(<span class="unicode haudio"><span class="fn"><span style="white-space:nowrap;margin-right:.25em;"><a href="/wiki/File:Ru-Sankt_Peterburg_Leningrad_Petrograd_Piter.ogg" title="About this sound"><img alt="audio speaker icon" data-file-height="20" data-file-width="20" decoding="async" height="11" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/8a/Loudspeaker.svg/11px-Loudspeaker.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/8a/Loudsp

In [None]:
main_info_list = [p.get_text() for p in div_parser_otpt.find_all('p')[1:4]]
main_info_list

["Saint Petersburg (Russian: Санкт-Петербург, tr. Sankt-Peterburg, IPA:\xa0[ˈsankt pʲɪtʲɪrˈburk] (listen)), formerly known as Petrograd (1914–1924) and later Leningrad (1924–1991), is the second-largest city in Russia. The city is situated on the Neva River, at the head of the Gulf of Finland on the Baltic Sea, with a population of roughly 5.4\xa0million residents.[9] It is the fourth-most populous city in Europe, the most populous city on the Baltic Sea, as well as the world's northernmost city with over 1\xa0million residents. As an important Russian port on the Baltic Sea, it is governed as a federal city.\n",
 "The city was founded by Tsar Peter the Great on 27 May 1703 on the site of a captured Swedish fortress, and was named after apostle Saint Peter. Saint Petersburg is historically and culturally associated with the birth of the Russian Empire and Russia's entry into modern history as a European great power.[10] It served as a capital of the Tsardom of Russia and the subsequent

In [None]:

main_info_str = "\n".join(main_info_list)
main_info_str

'Saint Petersburg (Russian: Санкт-Петербург, tr. Sankt-Peterburg, IPA:\xa0[ˈsankt pʲɪtʲɪrˈburk] (listen)), formerly known as Petrograd (1914–1924) and later Leningrad (1924–1991), is the second-largest city in Russia. The city is situated on the Neva River, at the head of the Gulf of Finland on the Baltic Sea, with a population of roughly 5.4\xa0million residents.[9] It is the fourth-most populous city in Europe, the most populous city on the Baltic Sea, as well as the world\'s northernmost city with over 1\xa0million residents. As an important Russian port on the Baltic Sea, it is governed as a federal city.\n\nThe city was founded by Tsar Peter the Great on 27 May 1703 on the site of a captured Swedish fortress, and was named after apostle Saint Peter. Saint Petersburg is historically and culturally associated with the birth of the Russian Empire and Russia\'s entry into modern history as a European great power.[10] It served as a capital of the Tsardom of Russia and the subsequent R

In [None]:
from pprint import pprint
pprint(main_info_str)

('Saint Petersburg (Russian: Санкт-Петербург, tr. Sankt-Peterburg, IPA:\xa0'
 '[ˈsankt pʲɪtʲɪrˈburk] (listen)), formerly known as Petrograd (1914–1924) and '
 'later Leningrad (1924–1991), is the second-largest city in Russia. The city '
 'is situated on the Neva River, at the head of the Gulf of Finland on the '
 'Baltic Sea, with a population of roughly 5.4\xa0million residents.[9] It is '
 'the fourth-most populous city in Europe, the most populous city on the '
 "Baltic Sea, as well as the world's northernmost city with over 1\xa0million "
 'residents. As an important Russian port on the Baltic Sea, it is governed as '
 'a federal city.\n'
 '\n'
 'The city was founded by Tsar Peter the Great on 27 May 1703 on the site of a '
 'captured Swedish fortress, and was named after apostle Saint Peter. Saint '
 'Petersburg is historically and culturally associated with the birth of the '
 "Russian Empire and Russia's entry into modern history as a European great "
 'power.[10] It served as 

##### Список районов

In [None]:
len(div_parser_otpt.find_all('table'))

33

In [None]:
body.find_all('table')[0]

<table class="infobox ib-settlement vcard"><tbody><tr><th class="infobox-above" colspan="2"><div class="fn org">Saint Petersburg</div></th></tr><tr><td class="infobox-subheader" colspan="2"><div class="category"><a href="/wiki/Federal_cities_of_Russia" title="Federal cities of Russia">Federal city</a></div></td></tr><tr class="mergedtoprow ib-settlement-official"><td class="infobox-full-data" colspan="2">Санкт-Петербург</td></tr><tr class="mergedtoprow"><td class="infobox-full-data" colspan="2"><div style="background-color:white;border-collapse:collapse;border:1px solid white;width:266px;display:table;margin-left: auto; margin-right: auto;"><div style="display:table-row"><div style="display:table-cell;border-top:0;padding:2px 0 0 2px"><div style="display:table;background-color:white;border-collapse:collapse"><div style="display:table-row"><div style="display:table-cell;border-top:0;padding:0 2px 2px 0"><a class="image" href="/wiki/File:Winter_Palace_Panorama_3.jpg" title="The Winter Pa

In [None]:
div_parser_otpt.find_all('table', {'class' : None})

[<table style="text-align:left; border-collapse:collapse; width:100%;">
 <tbody><tr style="background:none"><th colspan="5" style="text-align:center;">Religion in Saint Petersburg as of 2012 (Sreda Arena Atlas)<sup class="reference" id="cite_ref-2012ArenaAtlas_72-0"><a href="#cite_note-2012ArenaAtlas-72">[70]</a></sup><sup class="reference" id="cite_ref-2012Arena-religion-maps_73-0"><a href="#cite_note-2012Arena-religion-maps-73">[71]</a></sup></th></tr>
 <tr style="font-size:88%; height:4px;">
 <td colspan="2" style="padding:0 4px; text-align:left;"></td>
 <td style="width:100px; text-align:left;"></td>
 <td colspan="2" style="padding:0 4px; width:1em; text-align:right;"></td>
 </tr>
 <tr>
 <td colspan="2" style="padding-left: 0.4em; padding-right: 0.4em; min-width: 8em;"><a href="/wiki/Russian_Orthodox_Church" title="Russian Orthodox Church">Russian Orthodoxy</a></td>
 <td style="width: 100px; border-left: solid 1px silver; border-right: solid 1px silver;"><div style="background:Dark

In [None]:
table = div_parser_otpt.find_all(lambda tag: tag.name == 'table' and not tag.attrs)
table

[<table>
 <tbody><tr>
 <td colspan="2">Saint Petersburg is divided into 18 administrative districts:
 </td>
 <td rowspan="3"><div class="floatright"><a class="image" href="/wiki/File:Spb_all_districts_2005_abc_rus.svg" title="Administrative divisions of the city of Saint Petersburg"><img alt="Administrative divisions of the city of Saint Petersburg" data-file-height="3437" data-file-width="3579" decoding="async" height="307" src="//upload.wikimedia.org/wikipedia/commons/thumb/d/d1/Spb_all_districts_2005_abc_rus.svg/320px-Spb_all_districts_2005_abc_rus.svg.png" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/d/d1/Spb_all_districts_2005_abc_rus.svg/480px-Spb_all_districts_2005_abc_rus.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/d/d1/Spb_all_districts_2005_abc_rus.svg/640px-Spb_all_districts_2005_abc_rus.svg.png 2x" width="320"/></a></div>
 </td></tr>
 <tr>
 <td>
 <ol><li><a href="/wiki/Admiralteysky_District" title="Admiralteysky District">Аdmiralteysky</a></li>
 

In [None]:
spb_districts = [li.get_text() for li in table[0].find_all('li')]
spb_districts

['Аdmiralteysky',
 'Vasileostrovsky',
 'Vyborgsky',
 'Kalininsky',
 'Кirovsky',
 'Kolpinsky',
 'Krasnogvardeysky',
 'Кrasnoselsky',
 'Kronshtadtsky',
 '',
 ' Kurortny',
 'Moskovsky',
 'Nevsky',
 'Petrogradsky',
 'Petrodvortsovy',
 'Primorsky',
 'Pushkinsky',
 'Frunzensky',
 'Tsentralny']

In [None]:
[li.get_text().strip() for li in html.find(id='Administrative_divisions').find_next('table').find_all('li', {'class': False})]

['Аdmiralteysky',
 'Vasileostrovsky',
 'Vyborgsky',
 'Kalininsky',
 'Кirovsky',
 'Kolpinsky',
 'Krasnogvardeysky',
 'Кrasnoselsky',
 'Kronshtadtsky',
 'Kurortny',
 'Moskovsky',
 'Nevsky',
 'Petrogradsky',
 'Petrodvortsovy',
 'Primorsky',
 'Pushkinsky',
 'Frunzensky',
 'Tsentralny']

##### Таблица климата

In [None]:
climate_data = html.find(id='Climate').find_next('table')
climate_data

<table class="wikitable mw-collapsible" style="width:100%; text-align:center; line-height: 1.2em; margin:auto;">
<tbody><tr>
<th colspan="14">Climate data for Saint Petersburg 1881–present; extremes since 1743
</th></tr>
<tr>
<th scope="row">Month
</th>
<th scope="col">Jan
</th>
<th scope="col">Feb
</th>
<th scope="col">Mar
</th>
<th scope="col">Apr
</th>
<th scope="col">May
</th>
<th scope="col">Jun
</th>
<th scope="col">Jul
</th>
<th scope="col">Aug
</th>
<th scope="col">Sep
</th>
<th scope="col">Oct
</th>
<th scope="col">Nov
</th>
<th scope="col">Dec
</th>
<th scope="col" style="border-left-width:medium">Year
</th></tr>
<tr style="text-align: center;">
<th scope="row" style="height: 16px;">Record high °C (°F)
</th>
<td style="background: #FFE2C5; color:#000000;">8.7<br/>(47.7)
</td>
<td style="background: #FFD7B0; color:#000000;">10.2<br/>(50.4)
</td>
<td style="background: #FFB76F; color:#000000;">14.9<br/>(58.8)
</td>
<td style="background: #FF6F00; color:#000000;">25.3<br/>(77.5)

In [None]:
climate_data.find_all('tr')[0]

<tr>
<th colspan="14">Climate data for Saint Petersburg 1881–present; extremes since 1743
</th></tr>

In [None]:
cnames = [th.get_text().strip() for th in climate_data.find_all('tr')[1].find_all('th')]
cnames

['Month',
 'Jan',
 'Feb',
 'Mar',
 'Apr',
 'May',
 'Jun',
 'Jul',
 'Aug',
 'Sep',
 'Oct',
 'Nov',
 'Dec',
 'Year']

In [None]:
import pandas as pd

values = []
for tr in climate_data.find_all('tr'):
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    if len(row) != len(cnames) - 1:
       continue
    values.append(row)

pd.DataFrame(values, columns=cnames[1:])   

Unnamed: 0,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,Year
0,8.7(47.7)\n,10.2(50.4)\n,14.9(58.8)\n,25.3(77.5)\n,32.0(89.6)\n,34.6(94.3)\n,35.3(95.5)\n,37.1(98.8)\n,30.4(86.7)\n,21.0(69.8)\n,12.3(54.1)\n,10.9(51.6)\n,37.1(98.8)\n
1,−3.0(26.6)\n,−3.0(26.6)\n,2.0(35.6)\n,9.3(48.7)\n,16.0(60.8)\n,20.0(68.0)\n,23.0(73.4)\n,20.8(69.4)\n,15.0(59.0)\n,8.6(47.5)\n,2.0(35.6)\n,−1.5(29.3)\n,9.1(48.4)\n
2,−5.5(22.1)\n,−5.8(21.6)\n,−1.3(29.7)\n,5.1(41.2)\n,11.3(52.3)\n,15.7(60.3)\n,18.8(65.8)\n,16.9(62.4)\n,11.6(52.9)\n,6.2(43.2)\n,0.1(32.2)\n,−3.7(25.3)\n,5.8(42.4)\n
3,−8.0(17.6)\n,−8.5(16.7)\n,−4.2(24.4)\n,1.5(34.7)\n,7.0(44.6)\n,11.7(53.1)\n,15.0(59.0)\n,13.4(56.1)\n,8.8(47.8)\n,4.0(39.2)\n,−1.8(28.8)\n,−6.1(21.0)\n,2.7(36.9)\n
4,−35.9(−32.6)\n,−35.2(−31.4)\n,−29.9(−21.8)\n,−21.8(−7.2)\n,−6.6(20.1)\n,0.1(32.2)\n,4.9(40.8)\n,1.3(34.3)\n,−3.1(26.4)\n,−12.9(8.8)\n,−22.2(−8.0)\n,−34.4(−29.9)\n,−35.9(−32.6)\n
5,44(1.7)\n,33(1.3)\n,37(1.5)\n,31(1.2)\n,46(1.8)\n,71(2.8)\n,79(3.1)\n,83(3.3)\n,64(2.5)\n,68(2.7)\n,55(2.2)\n,51(2.0)\n,661(26.0)\n
6,9\n,7\n,10\n,13\n,16\n,18\n,17\n,17\n,20\n,20\n,16\n,10\n,173\n
7,17\n,17\n,10\n,3\n,0\n,0\n,0\n,0\n,0\n,2\n,9\n,17\n,75\n
8,86\n,84\n,79\n,69\n,65\n,69\n,71\n,76\n,80\n,83\n,86\n,87\n,78\n
9,22\n,54\n,125\n,180\n,260\n,276\n,267\n,213\n,129\n,70\n,27\n,13\n,"1,636\n"


In [None]:
values = []
for tr in list(climate_data.find_all('tr'))[2:-2]:
    td = tr.find_all(['th', 'td'])
    row = [tr.text.strip() for tr in td]
    values.append(row)

pd.DataFrame(values, columns=cnames).set_index('Month')

Unnamed: 0_level_0,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,Year
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Record high °C (°F),8.7(47.7),10.2(50.4),14.9(58.8),25.3(77.5),32.0(89.6),34.6(94.3),35.3(95.5),37.1(98.8),30.4(86.7),21.0(69.8),12.3(54.1),10.9(51.6),37.1(98.8)
Average high °C (°F),−3.0(26.6),−3.0(26.6),2.0(35.6),9.3(48.7),16.0(60.8),20.0(68.0),23.0(73.4),20.8(69.4),15.0(59.0),8.6(47.5),2.0(35.6),−1.5(29.3),9.1(48.4)
Daily mean °C (°F),−5.5(22.1),−5.8(21.6),−1.3(29.7),5.1(41.2),11.3(52.3),15.7(60.3),18.8(65.8),16.9(62.4),11.6(52.9),6.2(43.2),0.1(32.2),−3.7(25.3),5.8(42.4)
Average low °C (°F),−8.0(17.6),−8.5(16.7),−4.2(24.4),1.5(34.7),7.0(44.6),11.7(53.1),15.0(59.0),13.4(56.1),8.8(47.8),4.0(39.2),−1.8(28.8),−6.1(21.0),2.7(36.9)
Record low °C (°F),−35.9(−32.6),−35.2(−31.4),−29.9(−21.8),−21.8(−7.2),−6.6(20.1),0.1(32.2),4.9(40.8),1.3(34.3),−3.1(26.4),−12.9(8.8),−22.2(−8.0),−34.4(−29.9),−35.9(−32.6)
Average precipitation mm (inches),44(1.7),33(1.3),37(1.5),31(1.2),46(1.8),71(2.8),79(3.1),83(3.3),64(2.5),68(2.7),55(2.2),51(2.0),661(26.0)
Average rainy days,9,7,10,13,16,18,17,17,20,20,16,10,173
Average snowy days,17,17,10,3,0,0,0,0,0,2,9,17,75
Average relative humidity (%),86,84,79,69,65,69,71,76,80,83,86,87,78
Mean monthly sunshine hours,22,54,125,180,260,276,267,213,129,70,27,13,1636


##### Изображения достопримечательностей

In [None]:
thumbs = div_parser_otpt.find_all('div', {'class' : 'thumb'})
thumbs

[<div class="thumb tright"><div class="thumbinner" style="width:222px;"><a class="image" href="/wiki/File:Bronze_Horseman_02.jpg"><img alt="" class="thumbimage" data-file-height="3000" data-file-width="4500" decoding="async" height="147" src="//upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Bronze_Horseman_02.jpg/220px-Bronze_Horseman_02.jpg" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Bronze_Horseman_02.jpg/330px-Bronze_Horseman_02.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Bronze_Horseman_02.jpg/440px-Bronze_Horseman_02.jpg 2x" width="220"/></a> <div class="thumbcaption"><div class="magnify"><a class="internal" href="/wiki/File:Bronze_Horseman_02.jpg" title="Enlarge"></a></div>The <i><a href="/wiki/Bronze_Horseman" title="Bronze Horseman">Bronze Horseman</a></i>, monument to Peter the Great</div></div></div>,
 <div class="thumb tright"><div class="thumbinner" style="width:222px;"><a class="image" href="/wiki/File:St._Nicholas_Maritime_Cathedral,_

In [None]:
thumbs[22:31]

[<div class="thumb tleft"><div class="thumbinner" style="width:222px;"><a class="image" href="/wiki/File:Gazprom_tower_(Lakhta_Center)_St_Petersburg._Russia.jpg"><img alt="" class="thumbimage" data-file-height="4783" data-file-width="3483" decoding="async" height="302" src="//upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg/220px-Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg/330px-Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg/440px-Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg 2x" width="220"/></a> <div class="thumbcaption"><div class="magnify"><a class="internal" href="/wiki/File:Gazprom_tower_(Lakhta_Center)_St_Petersburg._Russia.jpg" title="Enlarg

In [None]:
thumbs[22:31]

[<div class="thumb tleft"><div class="thumbinner" style="width:222px;"><a class="image" href="/wiki/File:Gazprom_tower_(Lakhta_Center)_St_Petersburg._Russia.jpg"><img alt="" class="thumbimage" data-file-height="4783" data-file-width="3483" decoding="async" height="302" src="//upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg/220px-Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg/330px-Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg/440px-Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg 2x" width="220"/></a> <div class="thumbcaption"><div class="magnify"><a class="internal" href="/wiki/File:Gazprom_tower_(Lakhta_Center)_St_Petersburg._Russia.jpg" title="Enlarg

In [None]:
srcs = [div.img['src'] for div in thumbs[22:31]]
srcs

['//upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg/220px-Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg',
 '//upload.wikimedia.org/wikipedia/commons/thumb/b/b0/Kazan_Cathedral_-_panoramio_%281%29.jpg/220px-Kazan_Cathedral_-_panoramio_%281%29.jpg',
 '//upload.wikimedia.org/wikipedia/commons/thumb/5/53/Saint_Isaac%27s_Square_SPB_%2801%29.jpg/220px-Saint_Isaac%27s_Square_SPB_%2801%29.jpg',
 '//upload.wikimedia.org/wikipedia/commons/thumb/1/17/RUS-2016-Aerial-SPB-Peter_and_Paul_Fortress_02.jpg/220px-RUS-2016-Aerial-SPB-Peter_and_Paul_Fortress_02.jpg',
 '//upload.wikimedia.org/wikipedia/commons/thumb/4/4e/Nevsky_Avenue_01.jpg/220px-Nevsky_Avenue_01.jpg',
 '//upload.wikimedia.org/wikipedia/commons/thumb/0/05/Saint_Petersburg_2019.jpg/220px-Saint_Petersburg_2019.jpg',
 '//upload.wikimedia.org/wikipedia/commons/thumb/0/07/The_Church_of_the_Saviour_on_Spilled_Blood_%2820956466968%29.jpg/220px-The_Church_of_the_Saviour_o

In [None]:
links = ['https:' + src for src in srcs]
links

['https://upload.wikimedia.org/wikipedia/commons/thumb/c/c6/Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg/220px-Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg',
 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/b0/Kazan_Cathedral_-_panoramio_%281%29.jpg/220px-Kazan_Cathedral_-_panoramio_%281%29.jpg',
 'https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Saint_Isaac%27s_Square_SPB_%2801%29.jpg/220px-Saint_Isaac%27s_Square_SPB_%2801%29.jpg',
 'https://upload.wikimedia.org/wikipedia/commons/thumb/1/17/RUS-2016-Aerial-SPB-Peter_and_Paul_Fortress_02.jpg/220px-RUS-2016-Aerial-SPB-Peter_and_Paul_Fortress_02.jpg',
 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/4e/Nevsky_Avenue_01.jpg/220px-Nevsky_Avenue_01.jpg',
 'https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Saint_Petersburg_2019.jpg/220px-Saint_Petersburg_2019.jpg',
 'https://upload.wikimedia.org/wikipedia/commons/thumb/0/07/The_Church_of_the_Saviour_on_Spilled_Blood_%282095646696

In [None]:
fnames = [link.split('/')[-1] for link in links]
fnames

['220px-Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg',
 '220px-Kazan_Cathedral_-_panoramio_%281%29.jpg',
 '220px-Saint_Isaac%27s_Square_SPB_%2801%29.jpg',
 '220px-RUS-2016-Aerial-SPB-Peter_and_Paul_Fortress_02.jpg',
 '220px-Nevsky_Avenue_01.jpg',
 '220px-Saint_Petersburg_2019.jpg',
 '220px-The_Church_of_the_Saviour_on_Spilled_Blood_%2820956466968%29.jpg',
 '220px-Smolny_Cathedral_SPB_02.jpg',
 '220px-RUS-2016-Aerial-SPB-Peterhof_Palace.jpg']

In [None]:
pwd

'/content'

In [None]:
mkdir spb_images

In [None]:
cd spb_images

/content/spb_images


In [None]:
ls

In [None]:
def dwnld_image(url, save_to):
    response = requests.get(url, allow_redirects=True)
    with open(save_to, 'wb') as f:        
        f.write(response.content)

In [None]:
for l, fn in zip(links, fnames):
    dwnld_image(l, '/content/spb_images/' + fn)

In [None]:
ls

220px-Gazprom_tower_%28Lakhta_Center%29_St_Petersburg._Russia.jpg
220px-Kazan_Cathedral_-_panoramio_%281%29.jpg
220px-Nevsky_Avenue_01.jpg
220px-RUS-2016-Aerial-SPB-Peter_and_Paul_Fortress_02.jpg
220px-RUS-2016-Aerial-SPB-Peterhof_Palace.jpg
220px-Saint_Isaac%27s_Square_SPB_%2801%29.jpg
220px-Saint_Petersburg_2019.jpg
220px-Smolny_Cathedral_SPB_02.jpg
220px-The_Church_of_the_Saviour_on_Spilled_Blood_%2820956466968%29.jpg


In [None]:
from skimage.io import imread_collection

col_dir = '/content/spb_images/*.jpg'
col = imread_collection(col_dir)

In [None]:
import matplotlib.pyplot as plt

for img in col:
    plt.figure()
    plt.imshow(img)
    plt.show()

## Задание

Проделать аналогичные процедуры, но вместо Beautiful Soup использовать библиотеку [LXML](https://lxml.de/). Другими словами, вам необходимо под каждой ячейкой добавить новую, где та же операция производится с помощью функций lxml.html

# Библитотека LXML

Документация подробна, но:
*   https://lxml.de/parsing.html#parsing-html
*   https://lxml.de/lxmlhtml.html#parsing-html

In [None]:
from lxml import etree
xml_tree = etree.HTML(test)
xml_tree

<Element html at 0x7fba126b04b0>

In [None]:
from lxml.html import fromstring, parse
html_tree = fromstring(test)
html_tree

<Element html at 0x7fba126b4350>

In [None]:
type(xml_tree), type(html_tree)

(lxml.etree._Element, lxml.html.HtmlElement)

lxml.html - методы специфические для html, лучше использовать его.