<a href="https://colab.research.google.com/github/Ferricty/Data-Science-Projects/blob/main/Data_challenge.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Challenge

Germany is divided in post-codes documentation here: https://worldpostalcode.com/germany/  and here: https://en.wikipedia.org/wiki/Postal_codes_in_Germany

**Tasks:**
- Build a small scraper, that downloads a list of all German cities from wikipedia (https://de.wikipedia.org/wiki/Liste_der_St%C3%A4dte_in_Deutschland) and gets the postcode(s) from each of this cities.
- Convert the postcode to longitude and latitude coordinates and return the results in a table or csv.
- Write a function, that takes a postcode or city name and radius (in km) as input and returns all postcodes within the radius.

## 1- Creating Virtual Environment On Google Colab

In [1]:
!which python
!python --version

/usr/local/bin/python
Python 3.11.7


In [2]:
# environment variable
%env PYTHONPATH=

env: PYTHONPATH=


In [3]:
# install virtual environment package
!pip install virtualenv

[0m

In [4]:
# create virtual environment
!virtualenv myenv

created virtual environment CPython3.11.7.final.0-64 in 261ms
  creator CPython3Posix(dest=/content/myenv, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==23.3.1, setuptools==69.0.2, wheel==0.42.0
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator


In [5]:
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

--2024-01-31 19:26:45--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.131.3, 104.16.130.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.131.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 141613749 (135M) [application/octet-stream]
Saving to: ‘Miniconda3-latest-Linux-x86_64.sh’


2024-01-31 19:26:46 (125 MB/s) - ‘Miniconda3-latest-Linux-x86_64.sh’ saved [141613749/141613749]



In [6]:
!chmod +x Miniconda3-latest-Linux-x86_64.sh

In [7]:
!./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local

PREFIX=/usr/local
Unpacking payload ...
                                                                                           
Installing base environment...


Downloading and Extracting Packages:


Downloading and Extracting Packages:

Preparing transaction: - done
Executing transaction: | / - \ | / - \ | / - \ | done
installation finished.


In [8]:
!conda install -q -y --prefix /usr/local python=3.11 ujson

Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - python=3.11
    - ujson


The following packages will be UPDATED:

  python                                  3.11.5-h955ad1f_0 --> 3.11.7-h955ad1f_0 


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done


In [9]:
import sys
sys.path.append('/usr/local/lib/python3.11/site-packages/')

In [10]:
# activate conda enviornment
import os
os.environ['CONDA_PREFIX'] = '/usr/local/envs/myenv'

In [11]:
# python version in new enviornment
!python --version

Python 3.11.7


In [12]:
!pip freeze -> requirements.txt

In [13]:
!pip install beautifulsoup4
!pip install requests
!pip install aiohttp
!pip install pandas

[0m

In [14]:
!pip install nest-asyncio
import nest_asyncio
nest_asyncio.apply()

[0m

In [15]:
import aiohttp
import asyncio
from bs4 import BeautifulSoup
import pandas as pd
import requests

## Build a small scraper, that downloads a list of all German cities from wikipedia and gets the postcode(s) from each of this cities.

In [16]:
url_main = 'https://de.wikipedia.org/wiki/Liste_der_St%C3%A4dte_in_Deutschland'
response = requests.get(url_main)
soup = BeautifulSoup(response.text, 'html.parser')

initial_data = soup.find_all('dd') # All the names are dd tags

city_name = [city.find("a").text for city in initial_data]
#print(len(city_name)) # 2056 it's equals to the amount of cities at the url_main page

city_href = [city.find("a").get("href") for city in initial_data] # getting href for each city
#print(city_href)

In [17]:
city_name[:5]

['Aach', 'Aachen', 'Aalen', 'Abenberg', 'Abensberg']

In [18]:
city_href[:5]

['/wiki/Aach_(Hegau)',
 '/wiki/Aachen',
 '/wiki/Aalen',
 '/wiki/Abenberg',
 '/wiki/Abensberg']

In [19]:
class WebScraper(object):
    def __init__(self, urls):
        self.urls = urls
        # Global Place To Store The Data:

        self.master_dict = {}
        # Run The Scraper:
        asyncio.run(self.main())

    async def fetch(self, session, url):
        try:
            async with session.get(url) as response:
                # 1. Extracting the Text:
                city_href = await response.text()

                # 2. Extracting the postcode:
                postcode = await self.extract_postcode(city_href)
                return url, postcode

        except Exception as e:
            print(str(e))

    async def extract_postcode(self, city_href):
        try:

            soup_city = BeautifulSoup(city_href, 'html.parser')

            city_postcode = soup_city.find('a', attrs={'href':"/wiki/Postleitzahl_(Deutschland)"}).find_next().text
            return city_postcode

        except Exception as e:
            print(str(e))

    async def main(self):
        tasks = []
        headers = {
            "user-agent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"}
        async with aiohttp.ClientSession(headers=headers) as session:
            for url in self.urls:
                tasks.append(self.fetch(session, url))

            htmls = await asyncio.gather(*tasks)

            # Storing the data.
            for html in htmls:
                if html is not None:
                    url = html[0]
                    self.master_dict[url] = {'postcode': html[1]}
                else:
                    continue

In [20]:
url_base = 'https://de.wikipedia.org'

url_city = [url_base + href for href in city_href]

In [21]:
# We will use the following dictionary to clean the dataframe that we will obtain in future steps.
city_code = dict(zip(url_city,city_name))

In [22]:
# The fragment size for splitting the list
FRAGMENT_SIZE = 100

# Generate smaller fragments for scraping
fragments = [url_city[i:i + FRAGMENT_SIZE] for i in range(0, len(url_city), FRAGMENT_SIZE)]

# We will store the dataframes in the following list.
df_list = []

for index, fragment in enumerate(fragments):
    # Perform scraping with each fragment
    print(f"Perform scraping with the fragment: {index}")
    scraper = WebScraper(urls = fragment)

    df_list.append(pd.DataFrame.from_dict(scraper.master_dict, orient='index'))

Perform scraping with the fragment: 0
Perform scraping with the fragment: 1
Perform scraping with the fragment: 2
Perform scraping with the fragment: 3
Perform scraping with the fragment: 4
Perform scraping with the fragment: 5
Perform scraping with the fragment: 6
Perform scraping with the fragment: 7
Perform scraping with the fragment: 8
Perform scraping with the fragment: 9
Perform scraping with the fragment: 10
Perform scraping with the fragment: 11
Perform scraping with the fragment: 12
Perform scraping with the fragment: 13
Perform scraping with the fragment: 14
Perform scraping with the fragment: 15
Perform scraping with the fragment: 16
Perform scraping with the fragment: 17
Perform scraping with the fragment: 18
Perform scraping with the fragment: 19
Perform scraping with the fragment: 20


In [23]:
df_all = pd.concat(df_list)
df_all.shape

(2056, 1)

The number of records obtained is 2056, the same as the number of cities at the beginning of the page.

In [24]:
df_all.head()

Unnamed: 0,postcode
https://de.wikipedia.org/wiki/A%C3%9Flar,"35614,35630 (Heinrichsegen)Vorlage:Infobox Gem..."
https://de.wikipedia.org/wiki/Aach_(Hegau),78267\n
https://de.wikipedia.org/wiki/Aachen,52062–52080\n
https://de.wikipedia.org/wiki/Aalen,"73430–73434, 73453\n"
https://de.wikipedia.org/wiki/Abenberg,91183\n


In [25]:
# Performing some data cleaning on the obtained dataframe.
df_all['name'] = df_all.index.map(city_code.get)
df_all.reset_index(level=0, inplace=True)
df_all = df_all.rename(columns={'index':'url'})
df_all['postcode'] = df_all['postcode'].str.replace('\\n','',regex = True)

In [26]:
df_all['first_postcode'] = df_all['postcode'].str.slice(0, 5)

In [27]:
df_all.head()

Unnamed: 0,url,postcode,name,first_postcode
0,https://de.wikipedia.org/wiki/A%C3%9Flar,"35614,35630 (Heinrichsegen)Vorlage:Infobox Gem...",Aßlar,35614
1,https://de.wikipedia.org/wiki/Aach_(Hegau),78267,Aach,78267
2,https://de.wikipedia.org/wiki/Aachen,52062–52080,Aachen,52062
3,https://de.wikipedia.org/wiki/Aalen,"73430–73434, 73453",Aalen,73430
4,https://de.wikipedia.org/wiki/Abenberg,91183,Abenberg,91183


In [28]:
# Checking for the existence of duplicate values.
df_all[df_all['first_postcode'].duplicated()].sort_values('first_postcode')

Unnamed: 0,url,postcode,name,first_postcode
1670,https://de.wikipedia.org/wiki/Senftenberg,"01945 (Peickwitz)01968 (Brieske, Großkoschen, ...",Senftenberg,1945
1607,https://de.wikipedia.org/wiki/Sch%C3%B6newalde,04916,Schönewalde,4916
1815,https://de.wikipedia.org/wiki/Uebigau-Wahrenbr...,"04924 (Wahrenbrück (mit Zinsdorf), Beiersdorf,...",Uebigau-Wahrenbrück,4924
792,https://de.wikipedia.org/wiki/Hettstedt,06333,Hettstedt,6333
1657,https://de.wikipedia.org/wiki/Seeland_(Sachsen...,"06449 (Friedrichsaue, Schadeleben),06464 (Fros...",Seeland,6449
730,https://de.wikipedia.org/wiki/Harzgerode,06493,Harzgerode,6493
1779,https://de.wikipedia.org/wiki/Thale,06502,Thale,6502
1916,https://de.wikipedia.org/wiki/Wei%C3%9Fenfels,"06667, 06688",Weißenfels,6667
1020,https://de.wikipedia.org/wiki/L%C3%BCtzen,"06679, 06686",Lützen,6679
1357,https://de.wikipedia.org/wiki/Orlam%C3%BCnde,07768,Orlamünde,7768


## Convert the postcode to longitude and latitude coordinates and return the results in a table or csv.

Installing geopy to obtain the longitude and latitude.

In [29]:
!pip install geopy

Collecting geopy
  Downloading geopy-2.4.1-py3-none-any.whl.metadata (6.8 kB)
Collecting geographiclib<3,>=1.52 (from geopy)
  Downloading geographiclib-2.0-py3-none-any.whl (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.3/40.3 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading geopy-2.4.1-py3-none-any.whl (125 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m125.4/125.4 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: geographiclib, geopy
Successfully installed geographiclib-2.0 geopy-2.4.1
[0m

In [64]:
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

import logging

# To hide WARNING:urllib3.connectionpool:Retrying
logging.getLogger(requests.packages.urllib3.__package__).setLevel(logging.ERROR)

def get_longitude_latitude(postcode):
    geolocator = Nominatim(user_agent="data-challenge2")
    geocode = RateLimiter(geolocator.geocode, min_delay_seconds = 1)
    location = geolocator.geocode({"postalcode": str(postcode)}, country_codes = 'de')
    if location:
        return location.latitude, location.longitude
    else:
        return None, None
def processing_df_to_obtain_lat_long(df, BATCH_SIZE):
    # We divide the process into batches (batch processing) to improve efficiency

    longitudes = []
    latitudes = []

    for index in range(0, len(df_all), BATCH_SIZE):
        # We obtain the longitudes and latitudes for the current batch
        coordinates = df_all['first_postcode'].iloc[index:index + BATCH_SIZE].apply(get_longitude_latitude)
        lat, lon = zip(*coordinates)  # We separate the latitudes and longitudes
        latitudes.extend(lat)
        longitudes.extend(lon)
    # We add the columns for latitude and longitude to the DataFrame
    df_all['latitude'] = latitudes
    df_all['longitude'] = longitudes
    return df_all

BATCH_SIZE = 100
df_all = processing_df_to_obtain_lat_long(df_all,BATCH_SIZE)

Checking for missing values

In [65]:
df_missing_coordenates = df_all[(df_all['latitude'].isnull()) | (df_all['longitude'].isnull())]
df_missing_coordenates

Unnamed: 0,url,postcode,name,first_postcode,latitude,longitude
9,https://de.wikipedia.org/wiki/Adenau,53511–53518,Adenau,53511,,
176,https://de.wikipedia.org/wiki/Bad_Wildungen,34521–34537,Bad Wildungen,34521,,
233,https://de.wikipedia.org/wiki/Bexbach,66441–66450,Bexbach,66441,,
379,https://de.wikipedia.org/wiki/Dettelbach,"97335, 97337",Dettelbach,97335,,
416,https://de.wikipedia.org/wiki/Dransfeld,3712537127,Dransfeld,37125,,
530,https://de.wikipedia.org/wiki/Frankenau,35109–35110,Frankenau,35109,,
603,https://de.wikipedia.org/wiki/G%C3%B6ttingen,37001–37099,Göttingen,37001,,
679,https://de.wikipedia.org/wiki/Gro%C3%9F-Gerau,64501–64521,Groß-Gerau,64501,,
746,https://de.wikipedia.org/wiki/Heidenau_(Sachsen),01801–01809,Heidenau,1801,,
789,https://de.wikipedia.org/wiki/Hessisch_Lichtenau,37230–37235,Hessisch Lichtenau,37230,,


We only change them to the last zip code in the range instead of the first.

In [66]:
df_missing_coordenates_cp = df_missing_coordenates.copy()
df_missing_coordenates_cp["first_postcode"] = df_missing_coordenates_cp["postcode"].str.slice(-5)
df_missing_coordenates_cp

Unnamed: 0,url,postcode,name,first_postcode,latitude,longitude
9,https://de.wikipedia.org/wiki/Adenau,53511–53518,Adenau,53518,,
176,https://de.wikipedia.org/wiki/Bad_Wildungen,34521–34537,Bad Wildungen,34537,,
233,https://de.wikipedia.org/wiki/Bexbach,66441–66450,Bexbach,66450,,
379,https://de.wikipedia.org/wiki/Dettelbach,"97335, 97337",Dettelbach,97337,,
416,https://de.wikipedia.org/wiki/Dransfeld,3712537127,Dransfeld,37127,,
530,https://de.wikipedia.org/wiki/Frankenau,35109–35110,Frankenau,35110,,
603,https://de.wikipedia.org/wiki/G%C3%B6ttingen,37001–37099,Göttingen,37099,,
679,https://de.wikipedia.org/wiki/Gro%C3%9F-Gerau,64501–64521,Groß-Gerau,64521,,
746,https://de.wikipedia.org/wiki/Heidenau_(Sachsen),01801–01809,Heidenau,1809,,
789,https://de.wikipedia.org/wiki/Hessisch_Lichtenau,37230–37235,Hessisch Lichtenau,37235,,


In [103]:
df_missing_coordenates_cp = df_missing_coordenates_cp.drop(['latitude','longitude'],axis=1)
len(df_missing_coordenates_cp)

19

In [70]:
longitudes_missing = []
latitudes_missing = []
for index in range(0, len(df_missing_coordenates_cp), BATCH_SIZE):
    # We obtain the longitudes and latitudes for the current batch
    coordinates = df_missing_coordenates_cp['first_postcode'].iloc[index:index + BATCH_SIZE].apply(get_longitude_latitude)
    lat_missing, lon_missing = zip(*coordinates)  # We separate the latitudes and longitudes
    latitudes_missing.extend(lat_missing)
    longitudes_missing.extend(lon_missing)

# We add the columns for latitude and longitude to the DataFrame
df_missing_coordenates_cp['latitude'] = latitudes_missing
df_missing_coordenates_cp['longitude'] = longitudes_missing

In [82]:
df_missing_coordenates_cp2 = df_missing_coordenates_cp[(df_missing_coordenates_cp['latitude'].isnull()) |
                          (df_missing_coordenates_cp['longitude'].isnull())]
df_missing_coordenates_cp2

Unnamed: 0,url,postcode,name,first_postcode,latitude,longitude
603,https://de.wikipedia.org/wiki/G%C3%B6ttingen,37001–37099,Göttingen,37099,,


Looking for the missing value at https://worldpostalcode.com/lookup we can see that your zip code is 37073

In [86]:
df_missing_coordenates_cp_2 = df_missing_coordenates_cp2.drop(['latitude','longitude'],axis=1)
df_missing_coordenates_cp_2

Unnamed: 0,url,postcode,name,first_postcode
603,https://de.wikipedia.org/wiki/G%C3%B6ttingen,37001–37099,Göttingen,37099


In [87]:
df_missing_coordenates_cp_2["first_postcode"] = 37073
df_missing_coordenates_cp_2

Unnamed: 0,url,postcode,name,first_postcode
603,https://de.wikipedia.org/wiki/G%C3%B6ttingen,37001–37099,Göttingen,37073


In [88]:
df_missing_coordenates_cp2_cp = df_missing_coordenates_cp_2.copy()
longitudes_missing2 = []
latitudes_missing2 = []
for index in range(0, len(df_missing_coordenates_cp), BATCH_SIZE):
    # We obtain the longitudes and latitudes for the current batch
    coordinates2 = df_missing_coordenates_cp2_cp['first_postcode'].iloc[index:index + BATCH_SIZE].apply(get_longitude_latitude)
    lat_missing2, lon_missing2 = zip(*coordinates2)  # We separate the latitudes and longitudes
    latitudes_missing2.extend(lat_missing2)
    longitudes_missing2.extend(lon_missing2)

# We add the columns for latitude and longitude to the DataFrame
df_missing_coordenates_cp2_cp['latitude'] = latitudes_missing2
df_missing_coordenates_cp2_cp['longitude'] = longitudes_missing2
df_missing_coordenates_cp2_cp

Unnamed: 0,url,postcode,name,first_postcode,latitude,longitude
603,https://de.wikipedia.org/wiki/G%C3%B6ttingen,37001–37099,Göttingen,37073,51.534202,9.935047


In [92]:
df_missing_coordenates_cp2 = df_missing_coordenates_cp.dropna()
df_all2 = df_all.dropna()

In [93]:
df_missing_coordenates_cp2

Unnamed: 0,url,postcode,name,first_postcode,latitude,longitude
9,https://de.wikipedia.org/wiki/Adenau,53511–53518,Adenau,53518,50.382586,6.929417
176,https://de.wikipedia.org/wiki/Bad_Wildungen,34521–34537,Bad Wildungen,34537,51.110486,9.116245
233,https://de.wikipedia.org/wiki/Bexbach,66441–66450,Bexbach,66450,49.359519,7.261806
379,https://de.wikipedia.org/wiki/Dettelbach,"97335, 97337",Dettelbach,97337,49.805137,10.153645
416,https://de.wikipedia.org/wiki/Dransfeld,3712537127,Dransfeld,37127,51.488141,9.747195
530,https://de.wikipedia.org/wiki/Frankenau,35109–35110,Frankenau,35110,51.097461,8.921511
679,https://de.wikipedia.org/wiki/Gro%C3%9F-Gerau,64501–64521,Groß-Gerau,64521,49.90674,8.480224
746,https://de.wikipedia.org/wiki/Heidenau_(Sachsen),01801–01809,Heidenau,1809,50.960449,13.854582
789,https://de.wikipedia.org/wiki/Hessisch_Lichtenau,37230–37235,Hessisch Lichtenau,37235,51.203117,9.712111
853,https://de.wikipedia.org/wiki/Ingelheim_am_Rhein,"55216, 55218, 55262, 55263",Ingelheim am Rhein,55263,49.97744,8.118287


Joining the two dataframes into one with the required data

In [105]:
df_final = pd.concat([df_all2,df_missing_coordenates_cp2,df_missing_coordenates_cp2_cp])

In [106]:
df_all2.shape,df_missing_coordenates_cp2.shape,df_missing_coordenates_cp2_cp.shape

((2037, 6), (18, 6), (1, 6))

In [107]:
df_final.shape

(2056, 6)

In [108]:
df_final = df_final.drop(['url'],axis=1)

In [109]:
df_final.rename(columns={'first_postcode': 'searched_postcode'}, inplace=True)

In [110]:
df_final.head()

Unnamed: 0,postcode,name,searched_postcode,latitude,longitude
0,"35614,35630 (Heinrichsegen)Vorlage:Infobox Gem...",Aßlar,35614,50.593438,8.450166
1,78267,Aach,78267,47.84205,8.854742
2,52062–52080,Aachen,52062,50.776433,6.08667
3,"73430–73434, 73453",Aalen,73430,48.838746,10.08562
4,91183,Abenberg,91183,49.237606,10.951096


In [111]:
df_final.to_csv('city_details.csv', index=False)

## Write a function, that takes a postcode or city name and radius (in km) as input and returns all postcodes within the radius.

In [112]:
df_city_details = pd.read_csv('city_details.csv')
df_city_details.head()

Unnamed: 0,postcode,name,searched_postcode,latitude,longitude
0,"35614,35630 (Heinrichsegen)Vorlage:Infobox Gem...",Aßlar,35614,50.593438,8.450166
1,78267,Aach,78267,47.84205,8.854742
2,52062–52080,Aachen,52062,50.776433,6.08667
3,"73430–73434, 73453",Aalen,73430,48.838746,10.08562
4,91183,Abenberg,91183,49.237606,10.951096


In [113]:
df_city_details['name'].value_counts()

Waldenburg         2
Arnstein           2
Lorch              2
Freudenberg        2
Lichtenfels        2
                  ..
Guben              1
Gronau (Westf.)    1
Gronau (Leine)     1
Groitzsch          1
Göttingen          1
Name: name, Length: 2049, dtype: int64

In [114]:
(df_city_details['name'].value_counts() > 1).head(10).sum()

7

There are 7 cities with the same name

In [115]:
df_city_details['coordinate'] = df_city_details.apply(
                                                      lambda row:
                                                      (row["latitude"] , row["longitude"]),
                                                      axis=1,
                                                  )
df_city_details.head()

Unnamed: 0,postcode,name,searched_postcode,latitude,longitude,coordinate
0,"35614,35630 (Heinrichsegen)Vorlage:Infobox Gem...",Aßlar,35614,50.593438,8.450166,"(50.59343775601734, 8.450166175038245)"
1,78267,Aach,78267,47.84205,8.854742,"(47.84204984036061, 8.854742132732317)"
2,52062–52080,Aachen,52062,50.776433,6.08667,"(50.77643283176553, 6.086669673288873)"
3,"73430–73434, 73453",Aalen,73430,48.838746,10.08562,"(48.83874607033844, 10.085620109546616)"
4,91183,Abenberg,91183,49.237606,10.951096,"(49.23760620916201, 10.951095987821228)"


In [116]:
city_coords = df_city_details[['name','coordinate']].set_index("name").T.to_dict('records')[0]

  city_coords = df_city_details[['name','coordinate']].set_index("name").T.to_dict('records')[0]


In [117]:
# Commented for improve readability

# city_coords

# {'Aßlar': (50.59343775601734, 8.450166175038245),
#  'Aach': (47.84204984036061, 8.854742132732317),
#  'Aachen': (50.77643283176553, 6.086669673288873),
#  'Aalen': (48.83874607033844, 10.085620109546616),
#  'Abenberg': (49.23760620916201, 10.951095987821228),
#  'Abensberg': (48.81903613154418, 11.853234060115607),
#  ...}

In [120]:
from geopy.distance import geodesic

def closer_cities(limit_distance: int,
                  origin_city = None,
                  postcode = None):

    if origin_city:
      # Criteria city
      closer_cities = []
      origen_coords = city_coords.get(origin_city)
      if origen_coords:
          for city, coords in city_coords.items():
              if city != origin_city:
                  distance = geodesic(origen_coords, coords).kilometers
                  if distance <= limit_distance:
                      closer_cities.append((city, distance))
          return closer_cities
      else:
          print("City coordinates not found")

    elif postcode:
        # Criteria postcode
        try:
            # Get the city name
            city_name = df_city_details[df_city_details['searched_postcode'] == postcode].name.to_list()[0]
            closer_cities = []
            origen_coords = city_coords.get(city_name)
            if origen_coords:
                for city, coords in city_coords.items():
                    if city != city_name:
                        distance = geodesic(origen_coords, coords).kilometers
                        if distance <= limit_distance:
                            closer_cities.append((city, distance))
                return closer_cities
            else:
                print("City coordinates not found")
        except IndexError:
            print("Postal code not found")
    else:
        print("You must provide limit_distance and (postcode or origin_city) parameters")

# Example:
closer_cities = closer_cities(limit_distance = 20, origin_city = 'Berlin')
closer_cities

[('Bernau bei Berlin', 19.921628520655517),
 ('Hennigsdorf', 16.976407350168405),
 ('Hohen Neuendorf', 17.401726999539864),
 ('Teltow', 17.063880072059305),
 ('Werneuchen', 14.787264345166232)]

In [121]:
!pip freeze -> requirements.txt