In [1]:
import aiohttp 
import asyncio

from bs4 import BeautifulSoup
import dateutil, datetime


# Introduction 

This is the first component of the project advertised on my website.

The RESOURCE_EXTRACTOR takes a list of resources manually clustered (this ensures quality), and extracts the html out of them.

The update to the extractor will include a better handling of the resource grouping under the url_assigner (so that it wont be necessary to initialise manually, for each resource cluster, 'self.output').

Note that the below design is functional to a program able, in perspective, to deal with sources from a number of different websites.

The weekly updates will include:
- A precise terminology, commented and improved functions (accepted data type specification, output type specification etc...).
- A parser able to extract dates and structure the text.

**soon:**

- A processing pipeline  (pyspark).
- A data storing mechanism (pydoop)
(I do not have a server so I will start the spark and hadoop servers locally)

When all the above will be completed I will try to model the data and see whether it is possible to extract some insights about the evolution of this war (a small anticipation of what I am thinking: _*Mathematics and Politics: Strategy, Voting, Power, and Proof*_ by Alan D. Taylor and Allison M. Pacelli: [here](https://link.springer.com/book/10.1007/978-0-387-77645-3).

# Section I - Retrieving HTML 

The purpose of the ResourceExtractor is navigating on the ISW website pages and retrieve from them the html. The result is stored in the variable 'output'. Note how the code is defined in a general way: self.all_resources is a dictionary allowing for the integration of multiple resources. This does not mean that the code can be extended to a number of arbitrary resources. For the latter funcionality to be available, it would be necessary to devise a generalise TextParser (defined in Section II). Since the present project is an experiment with certain data engineering tools more that a scraper, the task of such generalisation **will be tackled upon completition**.

In [35]:
class ResourceExtractor:
    
    def __init__(self):
        self.russo_ukranian_war_sources = [
            'https://www.understandingwar.org/backgrounder/ukraine-conflict-updates',
            'https://www.understandingwar.org/backgrounder/ukraine-conflicts-updates-january-2-may-31-2024',
            ]
        self.all_resources = {'ISW_Russia_Ukraine_War': self.russo_ukranian_war_sources}
        self.output = {key:'' for key in self.all_resources}
    
    def url_assigner(self, url):
        for key, value_list in self.all_resources.items():
            if url in value_list:
                return str(key)
        return f'{url} : NOT IDENTIFIED'
        
    async def text_extractor(self, session, url):
        key = self.url_assigner(url)
        async with session.get(url) as response:
            await asyncio.sleep(1.5)  
            if response.status == 200:
                self.output[key] += await response.text()

    async def run_text_extractor(self):
        async with aiohttp.ClientSession() as session:
            tasks = [self.text_extractor(session, resource_page) for resource_list in self.all_resources.values() for resource_page in resource_list]
            await asyncio.gather(*tasks)
            return self.output

In [36]:
extractor = ResourceExtractor()
output  = await extractor.run_text_extractor()

# Section II - Retrieving Textual Elements

In this section I define a class called TextParser, note that the class is tailored to the websites being considered and cannot be applied on any website.

There are two main options for generalising the methods of this class, and they are dependent on what is meant with "generalisation".
#### generalisation = integration of multiple resources
In this case it suffices considering a finite set of websites and design methods that exploits the commonalities between them, or that change depending on the website. In this case the scraper would be generalise to n-resources (whereas currently it can be applied to only one).

#### generalisation = widespread applicability
In this case we would like a set of methods that apply to any website. To do this it is necessary to devise an intelligent (or adaptive) program. Personally, I see the opportunity for Bayesian classifiers, but we'll see as soon as the "important" parts of the project will be completed: we're here to use Pyspark and Hadoop!

In [106]:
class TextParser:
    def __init__(self, resource_dictionary):
        self.resource_dictionary = resource_dictionary
        self.dates = []
        self.ps = []

    def page_dissecter(self):
        soups = []
        for key, html_text in self.resource_dictionary.items():
            soup = BeautifulSoup(html_text, 'html.parser')
            soups.append(soup)
        resource_p_tags = [[p_tag.text for p_tag in  soup.find_all('p')] for soup in soups]
        return resource_p_tags


    def is_date(self, argument):
        try:
            possible_date = dateutil.parser.parse(date)
            return isinstance(possible_date, datetime.datetime)
        except:
            return False


    #here I want to return the text of the page but taking into account the fact that the relevant text is after a date, "e.g May 31, 2024| this happened |June 3, 2024"
    #I do this by using a try-except block with pass and continue conditions. When no date is found no paragraph is saved (typically at the begining of the page, consisting of general info)
    #Note that the dates in self.dates are not ordered, and that the printed paragraph include also the date: these two issues will be solved in the next update
    #Note that the real purpose of the shuffler is mapping dates to paragraphs, this will be alsoo included in the next update.
    #Note that all the data-storage part should be reviewed because it is not efficient (this will be one of the final steps).
    def shuffler(self):
        resource_p_tags = self.page_dissecter()
        date = None
        for resource in resource_p_tags:
            for paragraph in resource:
                try:
                    date = dateutil.parser.parse(paragraph)
                    self.dates.append(date)
                    pass
                except:
                    if date:
                        pass
                    else:
                        continue
                if self.is_date(paragraph) == False:
                    print(paragraph)
                    
  

 
 
       
                  
            

In [108]:
tp = TextParser(output)
s = tp.shuffler()

## I copy paste the part of the output because I suspect that GitHub can't render it

##### May 31, 2024, 6:45pm ET


US and German officials confirmed that the United States and Germany have changed their policies to allow Ukraine to use US- and German-provided weapons to strike Russian territory with some restrictions but did not offer precise details about these restrictions. Secretary of State Antony Blinken stated on May 31 that President Joe Biden approved Ukraine's use of US-supplied weapons to defend against Russian aggression, "including against Russian forces that are massing on the Russian side of the border and then attacking into Ukraine."[1] Western media reported on May 30 that the Biden administration gave Ukraine permission to use US-provided weapons, including GMLRS rockets, for "counter-fire purposes" against the Russian forces conducting assaults in northern Kharkiv Oblast but has not changed its policy restricting Ukraine from using US-provided weapons, such as ATACMS, to conduct long-range strikes elsewhere into Russia.[2] Blinken's May 31 statement did not specify which US-provided weapons Ukraine would be able to use or if the United States would allow Ukraine to use US-supplied weapons to strike Russian concentrations in Kursk and Bryansk oblasts as well. It is also unclear from Blinken's statement if the United States will allow Ukraine to strike Russian forces that are massing across the border but have not yet attacked into Ukrainian territory. Blinken responded to a journalist's question on May 31 about whether the United States would allow Ukraine to use US-provided weapons to strike deeper into Russian territory, stating that the United States will "as necessary adapt and adjust."[3]
German Federal Government Spokesperson Steffen Hebestreit stated on May 31 that Ukraine has a "right under international law to defend itself" against Russian attacks and that Ukraine can use German-provided weapons "for this purpose."[4] Hebestreit noted that Russian forces have attacked Ukraine "in the Kharkiv areas from positions in the immediately adjacent Russian border region" but did not specify whether Germany will only allow Ukraine to use German-provided weapons to strike Russian territory near Kharkiv Oblast. German Ambassador to the UK Miguel Berger, however, specifically stated on May 31 that the German government has allowed Ukraine to use German weapons to "defend itself against attacks on [Kharkiv Oblast] from bordering Russian territory," and select Western media similarly reported that Germany had geographically restricted Ukraine to use German-provided weapons against the adjacent Russian border area (presumably only Belgorod Oblast) to defend northern Kharkiv Oblast.[5] Other Western states continue to emphasize that they are imposing few to no restrictions on the use of weapons they are providing to Ukraine, however. Radio Svoboda reported on May 31 that Dutch Foreign Minister Hanke Bruins-Slot stated that the Netherlands does not oppose Ukraine's use of F-16s against military targets on Russian territory for self-defense.[6]
Ukrainian forces conducted a series of drone and missile strikes against a Russian long-range radar system in occupied Crimea and an oil depot in Krasnodar Krai on May 31 following the May 30 Ukrainian strike against the Kerch Strait ferry crossing. Ukrainian media, citing unspecified sources, reported on May 31 that Ukraine's Security Service (SBU) conducted a successful drone strike against a Russian "Nebo-IED" long-range radar system near occupied Armyansk, Crimea, and estimated that the system is worth $100 million.[7] The radar system reportedly serviced a 380-kilometer-long section of the frontline, and Ukrainian forces reportedly observed a shutdown of the radar's radiation signature following the drone strike, indicating that the strike took the system offline. The Ukrainian General Staff reported that Ukrainian forces conducted a strike on an oil depot near the port of Kavkaz, Krasnodar Krai with several Neptune anti-ship missiles early in the morning on May 31, and geolocated footage published on May 31 shows a fire at the oil depot.[8] Krasnodar Krai Governor Veniamin Kondratyev stated that Russian air defenses repelled an unspecified large number of Ukrainian drones targeting Krasnodar Krai and that the strike damaged three petroleum tanks at an oil depot in Temryuk Raion.[9] Russian opposition outlet Astra stated that Ukrainian forces struck at least two additional facilities at the port and damaged a substation that provides power to the Kerch Strait Bridge.[10] Russian milbloggers claimed that Ukrainian drones also struck a railway train carrying fuel near the oil depot.[11]
The Ukrainian General Staff reported on May 30 that Ukrainian forces conducted a successful ATACMS strike on a ferry crossing and damaged two ferries that Russian forces were using to transport forces and equipment across the Kerch Strait to occupied Crimea on the night of May 29 to 30.[12] Ukrainian Southern Operational Command Spokesperson Captain Third Rank Dmytro Pletenchuk stated on May 31 that Russian forces still rely on the ferry crossing because the railway line across the Kerch Strait Bridge is unfinished and that the strike should affect the provisioning of the Russian force grouping in occupied Crimea.[13] Russian sources issued conflicting reports on May 30 about the results of the May 29 to 30 Ukrainian strike – a Crimean occupation administration official claimed that the strike damaged two pilot boats, a car, and a section of the railway line, while Russia opposition outlet Astra stated that the strike sunk the Mechta pilot boat.[14] The port of Kavkaz reportedly specializes in servicing rail and truck ferry vessels, and the May 31 strike may be another aspect of Ukraine's strike against the ferry crossing.[15]
Ukraine signed long-term bilateral security agreements with Sweden, Iceland, and Norway on May 31. The Ukraine-Sweden agreement stipulates that Sweden will provide 6.5 billion euros (about $7 billion) of military assistance for the next decade, will transfer an unspecified amount of ASC 890 advanced early warning and control (AEW&C) aircraft, and continue efforts to transfer JAS 39 Gripen aircraft to Ukraine.[16] The Swedish military assistance package announced on May 29, worth about $1.25 billion and containing an ASC 890 aircraft, is likely part of this bilateral security agreement.[17] The Ukraine-Iceland agreement stipulates that Iceland will provide at least $30 million annually from 2024 to 2028 to finance and purchase defense materials and help develop Ukraine's defense industry.[18] The Ukraine-Norway agreement stipulates that Norway will provide assistance worth 75 billion kroner (about $7.1 billion) from 2023 to 2027, including at least 13.5 billion kroner (about $1.2 billion) in 2024.[19] Norway will also provide Ukraine with air and missile defense systems, including NASAMs, and help develop Ukraine's aircraft capabilities including with F-16 fighters.
Germany and Poland announced additional large military assistance packages for Ukraine. German Defense Minister Boris Pistorius announced on May 30 a package worth 500 million euros (about $542 million) that includes a Patriot air defense system, a "large number" of IRIS-T SLM air defense missiles, a smaller number of shorter-range IRIS-T SLS air defense missiles, reconnaissance and combat drones, and spare parts including artillery gun barrels.[20] Polish Foreign Minister Radoslaw Sikorski announced on May 31 that Poland is preparing a military assistance package for Ukraine worth four billion euros (about $4.3 billion).[21]
Russia's continued efforts to rally Collective Security Treaty Organization (CSTO) member countries around an imagined confrontation with the West likely stems from Russian concerns about the CSTO's longevity as a vector for Russian influence. Russian Defense Minister Andrei Belousov addressed a meeting of the CSTO Council of Defense Ministers in Almaty, Kazakhstan on May 31 and claimed that a tense situation in Eastern Europe and an alleged NATO military buildup threaten the security of CSTO members.[22] Belousov alleged that the US and its allies are a destabilizing geopolitical force and that NATO countries seek to strengthen their positions in the Caucasus and gain access to resources in the Caspian Sea and direct access to Central Asia.[23] Belousov warned that the West has unleashed an information war and sanctions against CSTO members to undermine the organization and called on CSTO members to coordinate their foreign policies to present a united front.[24] Belousov stated that Russia is specifically concerned about alleged US and NATO plans to involve nominal CSTO member Armenia in the West's sphere of interest.[25] Armenia has effectively ceased participation in the CSTO following Russia's failure to prevent Armenia's loss of Nagorno-Karabakh, and Armenia remains a CSTO member only in name.[26] The Kremlin has explicitly threatened Armenia if Armenia does not resume active engagement in the CSTO and return to a pro-Kremlin alignment.[27] Armenia has specifically questioned the value of its CSTO membership following the loss of Nagorno-Karabakh, and the Kremlin is likely concerned that deteriorating relations with Armenia could prompt other CSTO members to question the utility of their CSTO membership.[28] Recent tensions in the Russian-Tajik relationship following the March 2024 Crocus City Hall attack and Central Asian concerns about the impacts of secondary sanctions may be incentivizing the Kremlin to intensify efforts to convince CSTO members that the organization and their involvement in other Russian-led multilateral organizations is worthwhile.[29]
Kazakhstan, Kyrgyzstan, and Tajikistan are unlikely to buy into the Kremlin's imagined geopolitical confrontation with the West, and the Kremlin will likely have to offer more concrete promises to maintain the CSTO as a viable collective security organization oriented around Russian interests. Belousov met with Tajikistani Defense Minister Sherali Mirzo in a bilateral meeting on May 31 and stressed that the CSTO will address the escalating situation on the CSTO's southern border.[30] Belousov claimed that the situation in Afghanistan and the threat of terrorism remain the main sources of instability in Central Asia and that the CSTO must have timely responses to this threat, including strengthening the Tajikistan-Afghanistan border.[31] Russia is currently considering delisting the Taliban as a prohibited organization and will likely strengthen cooperation with the Taliban to combat the Islamic State’s Afghan branch IS-Khorasan (IS-K), which conducted the Crocus City Hall attack.[32] IS-K recruited Tajikistani citizens for the Crocus City Hall attack, and Tajikistan likely views multilateral counterterrorism operations as a way to repair strained relations with Russia while also combating transnational terrorist threats emanating from Afghanistan.[33] Russian President Vladimir Putin met with the Russian Security Council on May 31 and also emphasized strengthening international cooperation on counterterrorism.[34] Other Central Asian states, including CSTO members Kyrgyzstan and Kazakhstan, likely view Russian offers for counterterrorism cooperation as attractive benefits of continued security relations with Russia.
Although Russian forces made significant tactical gains in northern Kharkiv Oblast in early May 2024, Russian Defense Minister Andrei Belousov heavily overestimated Russian advances in Ukraine since the start of 2024. Belousov claimed on May 31 that Russian forces have seized 880 square kilometers thus far in 2024.[35] ISW has observed evidence confirming that Russian forces have only seized approximately 752 square kilometers in 2024, however. ISW previously assessed that Russian forces seized about 516 square kilometers between January 1, 2024, and April 29, 2024.[36]
Ukraine and Russia conducted a one-for-one prisoner of war (POW) exchange on May 31, the first POW exchange since February 8. Ukrainian and Russian officials announced that Ukraine and Russia exchanged 75 Ukrainian POWs for 75 Russian POWs, and the Russian Ministry of Defense (MoD) credited the United Arab Emirates with mediating the exchange.[37] Russian authorities recently blamed "far-fetched" Ukrainian demands for causing the several-month-long suspension of POW exchanges.[38]
The People's Republic of China (PRC) announced on May 31 that it will not join the June 2024 Ukraine peace summit. PRC Ministry of Foreign Affairs (MFA) Spokesperson Mao Ning stated on May 31 that the PRC will not attend the upcoming Ukraine peace summit in Switzerland because the meeting falls "far short of China's requests and expectations" and emphasized that "both Russia and Ukraine" should "endorse" the peace process.[39] Ukrainian President Volodymyr Zelensky previously stated that Ukraine will only be open to negotiations with Russia after developing a peace plan with its allies, and Ukrainian officials have recently emphasized that it is imperative for both the United States and China to attend the June 2024 peace summit as their participation is "decisive" in compelling Russia to participate in the process of restoring peace and security.[40] Senior Kremlin officials, including President Vladimir Putin, have recently endorsed the PRC's vague 12-point peace plan in Ukraine to falsely portray the Kremlin as willing to negotiate with Ukraine.[41] Senior Russian officials have repeatedly signaled that Russia is unwilling to engage in good-faith negotiations with Ukraine and has no interest in ending the war on terms that would prevent Putin from pursuing the destruction of an independent Ukraine.[42]
Key Takeaways:
Click here to read the full report.

Angelica Evans, Nicole Wolkov, Kateryna Stepanenko, Riley Bailey, and George Barros

##### May 30, 2024, 8:50pm ET

Click here to see ISW’s interactive map of the Russian invasion of Ukraine. This map is updated daily alongside the static maps present in this report.

Click here to see ISW’s 3D control of terrain topographic map of Ukraine. Use of a computer (not a mobile device) is strongly recommended for using this data-heavy tool.

Click here to access ISW’s archive of interactive time-lapse maps of the Russian invasion of Ukraine. These maps complement the static control-of-terrain map that ISW produces daily by showing a dynamic frontline. ISW will update this time-lapse map archive monthly.

Note: The data cut-off for this product was 1:30pm ET on May 30. ISW will cover subsequent reports in the May 31 Russian Offensive Campaign Assessment.

US President Joe Biden reportedly approved a policy change that will permit Ukraine to use US-provided weapons, including GMLRS rockets — but not longer-range ATACMS missiles — to strike within Russian territory near the border with Kharkiv Oblast. US officials and people familiar with the policy told Western media on May 30 that the Biden administration quietly gave Ukraine permission to use US-provided weapons for "counter-fire purposes" against the Russian forces conducting assaults in northern Kharkiv Oblast.[1] An unnamed US official clarified that the Biden administration has not changed its policy restricting Ukraine from using US-provided weapons to conduct long-range strikes, such as ATACMS, elsewhere into Russia. Several of Biden's advisors told The New York Times (NYT) in a story published on May 29 that a limited reversal of the US policy restricting strikes in Russia was "inevitable" and correctly assessed that the policy reversal would likely come with restrictions on how Ukraine could use US-provided weapons against military targets and forces just within Russia's borders that are actively involved in attacks and strikes on Ukraine.[2] The Washington Post reported that another unnamed US official stated that the US has placed no restriction on Ukraine's use of US-provided air defenses to shoot down Russian missiles or fighter jets over Russian territory "if they pose a threat to Ukraine."[3] NYT reported on May 22 that US Secretary of State Antony Blinken has been urging Biden to lift these restrictions on Ukraine.[4] It is unclear how far into Belgorod Oblast the US is permitting Ukrainian forces to strike with US-provided weapons, or if Ukraine would be allowed to strike Russian force and equipment concentrations in Kursk and Bryansk oblasts. Russian military targets outside the immediate border area with Kharkiv Oblast are also legitimate military targets, however, and continued restrictions on Ukraine's ability to strike targets elsewhere in Russia hinder Ukraine's ability to defend itself against Russian aggression. Russia still enjoys some sanctuary in which the Russian military can shield military forces before they get close enough to Kharkiv, or enter other parts of Ukraine. Russia will continue to benefit from any partial sanctuary so long as Western states continues to impose restrictions on Ukraine’s ability to defend itself. ISW continues to assess that the US should allow Ukraine to strike all legitimate military target in Russia’ operational and deep rear with US-provided weapons.

Ukraine's European allies continue to announce their support for allowing Ukraine to use Western-provided weapons to strike military targets in Russia. Danish Foreign Minister Lars Løkke Rasmussen confirmed during a press conference on May 30 in Brussels that Denmark will allow Ukraine to use Danish-provided weapons and promised F-16 fighter jets to strike military targets in Russia.[5] Rasmussen stated that this is not a new position and that Denmark has long made its support for Ukraine's right to strike military targets in Russia clear. Norwegian Foreign Minister Jan Lipavsky stated during a NATO ministerial meeting on May 30 that Ukraine should have the right to strike military targets in Russia.[6] Politico reported on May 29 that sources familiar with German Chancellor Olaf Scholz's positions stated that Scholz is now in favor of granting Ukraine permission to use Western weapons to strike military targets in Russia.[7] ISW assesses that the reversal of the policy will play a critical role in Ukraine's defense of its territory and future counteroffensive operations.[8]

Senior Ukrainian military officials reported that Russian forces are transferring forces to northern Kharkiv Oblast from other sectors of the frontline, indicating that the Russian military continues to prioritize efforts to draw and fix Ukrainian forces in northern Kharkiv Oblast. Ukrainian Commander-in-Chief Colonel General Oleksandr Syrskyi and the Ukrainian General Staff reported on May 30 that the Russian military is transferring elements of an unspecified number of additional regiments and brigades from other unspecified areas of the frontline and from training grounds to the Strilecha-Lyptsi (north of Kharkiv City) and Vovchansk (northeast of Kharkiv City) areas in northern Kharkiv Oblast.[9] Syrskyi reported that the Russian military does not have enough forces in northern Kharkiv Oblast to conduct a full-scale offensive and break through Ukrainian defenses, however. Kharkiv Oblast Military Administration Head Oleh Synehubov also reported that Russian forces are transferring reserves to the Lyptsi and Vovchansk directions to draw and fix as many Ukrainian forces in northern Kharkiv Oblast as possible and maintain the current tempo of Russian offensive operations in the area.[10] Synehubov stated that Russian forces have not concentrated a "strike group" near Zolochiv Hromada, Kharkiv Oblast (northwest of Kharkiv City) but that Russian forces could redirect forces in the Lyptsi and Vovchansk directions to the Zolochiv direction. Several Russian milbloggers purposefully misreported Synehubov's statements about possible evacuations in the event of Russian attacks and claimed that he had stated that Russian forces are preparing offensive operations in the Zolochiv direction.[11] The Russian military's transfer of reinforcements to Kharkiv Oblast indicates that the Russian military likely continues to prioritize efforts to draw and fix Ukrainian forces from critical sectors of the frontline in eastern Ukraine and establish a "buffer zone" in northern Kharkiv Oblast.[12] Russian forces likely intend to launch the second phase of their offensive operation in northern Kharkiv Oblast following their intended seizure of Vovchansk, although positional fighting and possible Ukrainian counterattacks could require Russian forces to conduct another wave of intensified assaults in the area to complete the seizure of the settlement. ISW continues to assess that Russian forces are likely holding back many of the reserves from the Northern Grouping of Forces, which is staffed with elements of the 11th Army Corps (AC), 44th AC, and 6th Combined Arms Army (CAA) — all part of the Russian Leningrad Military District (LMD) — until the Northern Grouping of Forces is closer to its reported planned end strength of 50,000 to 70,000 personnel.[13] The Northern Grouping of Forces, even at the upper limit of its reported end strength, will lack the necessary manpower needed to conduct a successful operation to envelop, encircle, or seize Kharkiv City.
