# <span style="color: navy"> Project #2 – DS Tools </span>

### <span style="color: navy"> Background </span>

In this project we will explore and visualize some facts about mountains and peaks around the world.  
The project is entirely based on the data available in the site PeakWare.com,  
which entitles itself as the “World Mountain Encyclopedia”.

# <span style="color: green">Part I – pre-processing</navy>

In [33]:
import re
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from time import time
from datetime import datetime

from mpl_toolkits.basemap import Basemap
# import networkx as nx

matplotlib.style.use('ggplot')

# %matplotlib inline
%matplotlib notebook
# i use the notebook parameter instead of in line in order to zoom in the graphs using matplotlib
from os import getcwd
pd.set_option('notebook_repr_html', True)

#### Jupyter will display all variable without the need for a print statement and not only the last

In [34]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [35]:
# this limit the column width
pd.options.display.max_colwidth = 20
# this controls the floating rounding (precision, keep 2 decimals after .)
pd.options.display.precision =2

### <span style="color: navy  ">General Functions</span>

In [36]:
def print_tag (tags) :
    for i,tag in enumerate(tags, start=0):
        print "***{:03}***".format(i),
        print str(tag).replace('\n', '').strip(),
        
def extract_tag_string(tag_tr,name):
    return unicode(tag_tr.find(name='td').string) if name in str(tag_tr) else None

### <span style="color: navy  ">Extracting continents list</span>

In the next block, we access [peakware.com][1].
Extracting the continents list.  
This list will  allow us to extract the entire peaks code list
[1]: https://www.peakware.com/peaks.php "peakware.com"

In [37]:
# url: It is the website/page you want to open.
url = "https://www.peakware.com/peaks.php"
# Getting the webpage, creating a Response object.
resp = requests.get(url)
# Extracting the source code of the page.
data = resp.text
# Passing the source code to Beautiful Soup to create a BeautifulSoup object for it.
soup = bs(data, 'lxml')

##### 1st way with split

In [38]:
continents_list=[]
attr= {'id':'contList'}
continents_tags= soup.find(name ='ul',attrs=attr).find_all(name ='li')
# print_tag (continents_tags) 
continents_list =[(str((tag.a)).split('choice=')[1]).split('">')[0] for tag in continents_tags]
continents_list

['AfA', 'AnA', 'AsA', 'AuA', 'EuA', 'NoA', 'SoA']

##### 2ns way with re

In [9]:
cont_list=[]
continents_tags = soup.find_all('a')
for tag in continents_tags:
    x = re.findall('href="peaks.php\?choice=\w{3}',str(tag))
    if len (x) :
        cont_list.append(x)
cont_list2 = [str(continent)[-5:-2]  for continent in cont_list [::2]]
cont_list2 


['AfA', 'AnA', 'AsA', 'AuA', 'EuA', 'NoA', 'SoA']

### <span style="color: indigo ">Extracting peak list</span>

IIn the next block, we will iterate through continents list, in order to create "peaks_code_list"

In [39]:
peaks_code_list = []
for continent in continents_list:
    url_continents = 'https://www.peakware.com/peaks.php'+'?choice='+continent
    resp = requests.get(url_continents)
    soup = bs(resp.text, 'lxml')
    peaks_tags = soup.find(name='ul', id='peakList').find_all(name ='li')
    # print_tag(peaks_tags)
    peaks_code_list.extend([(str((tag.a)).split('php?pk=')[1]).split('">')[0] for tag in peaks_tags])
print len (peaks_code_list)

4194


### <span style="color: indigo">Main - Extracting peaks sttributes</span> 

In the next block, we will iterate through "peaks_code_list".  
accessing each web page and extracting each peak attribute and the value (including "peak name" and "peak code").  
The data will be inserted into a dictionary, which will pe append to a data frame.  
At the end, all data will be exported to a CSV file

In [40]:
# main 1st
df = pd.DataFrame()
df_temp = pd.DataFrame()
peaks_df_list = []
t_1 = time()

for i,peak in enumerate(peaks_code_list, start=0):
    temp_peak_dict = dict()
    print "{:05} - peak code : {} ".format (i,peak)   
     
    url_peak = 'https://www.peakware.com/peaks.php?pk='+peak
    try:
        resp = requests.get(url_peak)
    except requests.exceptions.ConnectionError:
        r.status_code = "Connection refused"
    soup = bs(resp.text, 'lxml')
    
    #     initialize the 1st attribute peak_code
    temp_peak_dict['peak_code']=str(peak) # update
    #     initialize the 2nd attribute name_of_peak
    temp_peak_dict['name_of_peak'] = unicode(soup.find(name='h1').string)

    #     initialize the rest of the attributes depending on the tags available
    tags = soup.find(name='table').find_all(name ='tr')   
#     print_tag(tags)
    
    for tag_tr in tags :
        temp_peak_dict[str(tag_tr.find(name='th').string).strip()] = unicode(tag_tr.find(name='td').string)
    
#     append the new dictionary as a raw in the dataframe
    df = df.append(temp_peak_dict, ignore_index=True)

#     timing the ruuning time
t_2 = time()
print "it took  {:.3f} seconds".format(t_2-t_1)

# it took  4321.018 seconds
# it took  42.908 seconds
# it took  33.116 seconds
# it took  29.796 seconds
# it took  27.725 seconds
# it took  4073.604 seconds

00000 - peak code : 1387 
00001 - peak code : 2483 
00002 - peak code : 3122 
00003 - peak code : 23 
00004 - peak code : 2382 
00005 - peak code : 2133 
00006 - peak code : 4543 
00007 - peak code : 37 
00008 - peak code : 40 
00009 - peak code : 1937 
00010 - peak code : 1566 
00011 - peak code : 993 
00012 - peak code : 74 
00013 - peak code : 147 
00014 - peak code : 76 
00015 - peak code : 1100 
00016 - peak code : 93 
00017 - peak code : 1971 
00018 - peak code : 2167 
00019 - peak code : 1356 
00020 - peak code : 2207 
00021 - peak code : 666 
00022 - peak code : 681 
00023 - peak code : 4544 
00024 - peak code : 1373 
00025 - peak code : 4545 
00026 - peak code : 768 
00027 - peak code : 1758 
00028 - peak code : 2165 
00029 - peak code : 1035 
00030 - peak code : 135 
00031 - peak code : 139 
00032 - peak code : 4253 
00033 - peak code : 1429 
00034 - peak code : 1905 
00035 - peak code : 3361 
00036 - peak code : 1908 
00037 - peak code : 788 
00038 - peak code : 2205 
00039 

00320 - peak code : 1029 
00321 - peak code : 927 
00322 - peak code : 3145 
00323 - peak code : 1751 
00324 - peak code : 2806 
00325 - peak code : 3116 
00326 - peak code : 1031 
00327 - peak code : 1732 
00328 - peak code : 137 
00329 - peak code : 3808 
00330 - peak code : 820 
00331 - peak code : 4016 
00332 - peak code : 138 
00333 - peak code : 1546 
00334 - peak code : 328 
00335 - peak code : 3481 
00336 - peak code : 849 
00337 - peak code : 4588 
00338 - peak code : 1027 
00339 - peak code : 2877 
00340 - peak code : 2951 
00341 - peak code : 142 
00342 - peak code : 2827 
00343 - peak code : 143 
00344 - peak code : 1275 
00345 - peak code : 3246 
00346 - peak code : 144 
00347 - peak code : 925 
00348 - peak code : 1174 
00349 - peak code : 148 
00350 - peak code : 2398 
00351 - peak code : 2058 
00352 - peak code : 149 
00353 - peak code : 624 
00354 - peak code : 4695 
00355 - peak code : 957 
00356 - peak code : 867 
00357 - peak code : 2418 
00358 - peak code : 3616 
0

00639 - peak code : 2036 
00640 - peak code : 2280 
00641 - peak code : 4026 
00642 - peak code : 4003 
00643 - peak code : 4023 
00644 - peak code : 1726 
00645 - peak code : 3730 
00646 - peak code : 4299 
00647 - peak code : 2246 
00648 - peak code : 2245 
00649 - peak code : 4393 
00650 - peak code : 2034 
00651 - peak code : 4021 
00652 - peak code : 2251 
00653 - peak code : 1964 
00654 - peak code : 2180 
00655 - peak code : 3318 
00656 - peak code : 70 
00657 - peak code : 2267 
00658 - peak code : 1661 
00659 - peak code : 4300 
00660 - peak code : 4031 
00661 - peak code : 2035 
00662 - peak code : 2963 
00663 - peak code : 4398 
00664 - peak code : 1149 
00665 - peak code : 2256 
00666 - peak code : 2249 
00667 - peak code : 3715 
00668 - peak code : 1903 
00669 - peak code : 2259 
00670 - peak code : 1074 
00671 - peak code : 2255 
00672 - peak code : 2333 
00673 - peak code : 2476 
00674 - peak code : 4306 
00675 - peak code : 2247 
00676 - peak code : 4024 
00677 - peak c

00958 - peak code : 4069 
00959 - peak code : 587 
00960 - peak code : 2240 
00961 - peak code : 2239 
00962 - peak code : 4154 
00963 - peak code : 3042 
00964 - peak code : 1670 
00965 - peak code : 2463 
00966 - peak code : 1541 
00967 - peak code : 477 
00968 - peak code : 2124 
00969 - peak code : 2042 
00970 - peak code : 2110 
00971 - peak code : 4022 
00972 - peak code : 2143 
00973 - peak code : 2576 
00974 - peak code : 1469 
00975 - peak code : 2878 
00976 - peak code : 2444 
00977 - peak code : 4084 
00978 - peak code : 3334 
00979 - peak code : 2895 
00980 - peak code : 4070 
00981 - peak code : 1588 
00982 - peak code : 4080 
00983 - peak code : 594 
00984 - peak code : 2159 
00985 - peak code : 4410 
00986 - peak code : 4072 
00987 - peak code : 45 
00988 - peak code : 1946 
00989 - peak code : 202 
00990 - peak code : 2237 
00991 - peak code : 4386 
00992 - peak code : 2236 
00993 - peak code : 2218 
00994 - peak code : 1599 
00995 - peak code : 4479 
00996 - peak code 

01277 - peak code : 2867 
01278 - peak code : 721 
01279 - peak code : 705 
01280 - peak code : 3807 
01281 - peak code : 3722 
01282 - peak code : 2172 
01283 - peak code : 3187 
01284 - peak code : 3790 
01285 - peak code : 1241 
01286 - peak code : 3319 
01287 - peak code : 3144 
01288 - peak code : 1471 
01289 - peak code : 3449 
01290 - peak code : 1038 
01291 - peak code : 648 
01292 - peak code : 1223 
01293 - peak code : 1133 
01294 - peak code : 3776 
01295 - peak code : 1410 
01296 - peak code : 3558 
01297 - peak code : 844 
01298 - peak code : 3573 
01299 - peak code : 351 
01300 - peak code : 1534 
01301 - peak code : 3534 
01302 - peak code : 4074 
01303 - peak code : 1239 
01304 - peak code : 3585 
01305 - peak code : 1409 
01306 - peak code : 1395 
01307 - peak code : 727 
01308 - peak code : 3740 
01309 - peak code : 2933 
01310 - peak code : 2041 
01311 - peak code : 1191 
01312 - peak code : 2264 
01313 - peak code : 469 
01314 - peak code : 1464 
01315 - peak code :

01595 - peak code : 488 
01596 - peak code : 1753 
01597 - peak code : 207 
01598 - peak code : 1648 
01599 - peak code : 1008 
01600 - peak code : 4134 
01601 - peak code : 4680 
01602 - peak code : 1302 
01603 - peak code : 981 
01604 - peak code : 4482 
01605 - peak code : 434 
01606 - peak code : 262 
01607 - peak code : 990 
01608 - peak code : 1433 
01609 - peak code : 3603 
01610 - peak code : 2704 
01611 - peak code : 1487 
01612 - peak code : 2289 
01613 - peak code : 4199 
01614 - peak code : 3643 
01615 - peak code : 4075 
01616 - peak code : 3604 
01617 - peak code : 2819 
01618 - peak code : 2371 
01619 - peak code : 1236 
01620 - peak code : 680 
01621 - peak code : 322 
01622 - peak code : 3541 
01623 - peak code : 455 
01624 - peak code : 1698 
01625 - peak code : 4319 
01626 - peak code : 2018 
01627 - peak code : 2426 
01628 - peak code : 1057 
01629 - peak code : 1616 
01630 - peak code : 3600 
01631 - peak code : 402 
01632 - peak code : 1186 
01633 - peak code : 30

01913 - peak code : 3762 
01914 - peak code : 1194 
01915 - peak code : 1250 
01916 - peak code : 3805 
01917 - peak code : 3627 
01918 - peak code : 920 
01919 - peak code : 4678 
01920 - peak code : 1307 
01921 - peak code : 3567 
01922 - peak code : 1068 
01923 - peak code : 4219 
01924 - peak code : 3002 
01925 - peak code : 1929 
01926 - peak code : 3817 
01927 - peak code : 1196 
01928 - peak code : 1553 
01929 - peak code : 2147 
01930 - peak code : 2805 
01931 - peak code : 2807 
01932 - peak code : 4110 
01933 - peak code : 3794 
01934 - peak code : 1310 
01935 - peak code : 769 
01936 - peak code : 3788 
01937 - peak code : 292 
01938 - peak code : 1343 
01939 - peak code : 4071 
01940 - peak code : 4175 
01941 - peak code : 4086 
01942 - peak code : 3940 
01943 - peak code : 625 
01944 - peak code : 4098 
01945 - peak code : 1006 
01946 - peak code : 1063 
01947 - peak code : 4062 
01948 - peak code : 1203 
01949 - peak code : 892 
01950 - peak code : 1833 
01951 - peak code

02231 - peak code : 1000 
02232 - peak code : 557 
02233 - peak code : 265 
02234 - peak code : 458 
02235 - peak code : 331 
02236 - peak code : 650 
02237 - peak code : 1675 
02238 - peak code : 3823 
02239 - peak code : 3947 
02240 - peak code : 637 
02241 - peak code : 3655 
02242 - peak code : 266 
02243 - peak code : 1522 
02244 - peak code : 2403 
02245 - peak code : 579 
02246 - peak code : 3555 
02247 - peak code : 1635 
02248 - peak code : 3978 
02249 - peak code : 1270 
02250 - peak code : 270 
02251 - peak code : 967 
02252 - peak code : 2170 
02253 - peak code : 4161 
02254 - peak code : 3946 
02255 - peak code : 1494 
02256 - peak code : 530 
02257 - peak code : 3333 
02258 - peak code : 877 
02259 - peak code : 2473 
02260 - peak code : 3614 
02261 - peak code : 3072 
02262 - peak code : 4159 
02263 - peak code : 1890 
02264 - peak code : 3930 
02265 - peak code : 3688 
02266 - peak code : 1760 
02267 - peak code : 2013 
02268 - peak code : 4487 
02269 - peak code : 2064

02550 - peak code : 781 
02551 - peak code : 3219 
02552 - peak code : 3858 
02553 - peak code : 3872 
02554 - peak code : 4381 
02555 - peak code : 4431 
02556 - peak code : 4424 
02557 - peak code : 4430 
02558 - peak code : 3866 
02559 - peak code : 566 
02560 - peak code : 1129 
02561 - peak code : 2954 
02562 - peak code : 3423 
02563 - peak code : 453 
02564 - peak code : 1645 
02565 - peak code : 4601 
02566 - peak code : 154 
02567 - peak code : 1878 
02568 - peak code : 4320 
02569 - peak code : 1056 
02570 - peak code : 3109 
02571 - peak code : 945 
02572 - peak code : 3336 
02573 - peak code : 664 
02574 - peak code : 4340 
02575 - peak code : 428 
02576 - peak code : 4476 
02577 - peak code : 1474 
02578 - peak code : 4465 
02579 - peak code : 3278 
02580 - peak code : 3295 
02581 - peak code : 2985 
02582 - peak code : 4333 
02583 - peak code : 3859 
02584 - peak code : 4492 
02585 - peak code : 1492 
02586 - peak code : 1327 
02587 - peak code : 1696 
02588 - peak code :

02869 - peak code : 2929 
02870 - peak code : 2586 
02871 - peak code : 3150 
02872 - peak code : 633 
02873 - peak code : 527 
02874 - peak code : 4017 
02875 - peak code : 1629 
02876 - peak code : 2567 
02877 - peak code : 4663 
02878 - peak code : 946 
02879 - peak code : 675 
02880 - peak code : 98 
02881 - peak code : 3982 
02882 - peak code : 408 
02883 - peak code : 2061 
02884 - peak code : 100 
02885 - peak code : 430 
02886 - peak code : 2262 
02887 - peak code : 2196 
02888 - peak code : 2998 
02889 - peak code : 296 
02890 - peak code : 3738 
02891 - peak code : 4294 
02892 - peak code : 3526 
02893 - peak code : 1363 
02894 - peak code : 909 
02895 - peak code : 3241 
02896 - peak code : 1847 
02897 - peak code : 613 
02898 - peak code : 3307 
02899 - peak code : 1034 
02900 - peak code : 2222 
02901 - peak code : 1961 
02902 - peak code : 3497 
02903 - peak code : 4117 
02904 - peak code : 3488 
02905 - peak code : 3191 
02906 - peak code : 254 
02907 - peak code : 1489 

03188 - peak code : 2616 
03189 - peak code : 960 
03190 - peak code : 1369 
03191 - peak code : 4631 
03192 - peak code : 1781 
03193 - peak code : 3543 
03194 - peak code : 1298 
03195 - peak code : 1598 
03196 - peak code : 1096 
03197 - peak code : 2680 
03198 - peak code : 347 
03199 - peak code : 643 
03200 - peak code : 736 
03201 - peak code : 2365 
03202 - peak code : 178 
03203 - peak code : 1976 
03204 - peak code : 858 
03205 - peak code : 608 
03206 - peak code : 1812 
03207 - peak code : 181 
03208 - peak code : 3154 
03209 - peak code : 1232 
03210 - peak code : 4324 
03211 - peak code : 3413 
03212 - peak code : 2268 
03213 - peak code : 4645 
03214 - peak code : 3976 
03215 - peak code : 4454 
03216 - peak code : 3952 
03217 - peak code : 1932 
03218 - peak code : 4192 
03219 - peak code : 4445 
03220 - peak code : 3959 
03221 - peak code : 1114 
03222 - peak code : 3969 
03223 - peak code : 3971 
03224 - peak code : 3011 
03225 - peak code : 2449 
03226 - peak code : 

03507 - peak code : 1262 
03508 - peak code : 835 
03509 - peak code : 833 
03510 - peak code : 3349 
03511 - peak code : 1134 
03512 - peak code : 3676 
03513 - peak code : 3225 
03514 - peak code : 3436 
03515 - peak code : 4669 
03516 - peak code : 3516 
03517 - peak code : 2300 
03518 - peak code : 2505 
03519 - peak code : 1132 
03520 - peak code : 329 
03521 - peak code : 2871 
03522 - peak code : 1150 
03523 - peak code : 3253 
03524 - peak code : 3133 
03525 - peak code : 4518 
03526 - peak code : 4562 
03527 - peak code : 878 
03528 - peak code : 2949 
03529 - peak code : 2362 
03530 - peak code : 2914 
03531 - peak code : 4516 
03532 - peak code : 3475 
03533 - peak code : 233 
03534 - peak code : 568 
03535 - peak code : 4459 
03536 - peak code : 4342 
03537 - peak code : 3686 
03538 - peak code : 1531 
03539 - peak code : 371 
03540 - peak code : 377 
03541 - peak code : 3726 
03542 - peak code : 3089 
03543 - peak code : 975 
03544 - peak code : 1025 
03545 - peak code : 3

03826 - peak code : 4297 
03827 - peak code : 1548 
03828 - peak code : 452 
03829 - peak code : 2920 
03830 - peak code : 267 
03831 - peak code : 268 
03832 - peak code : 1755 
03833 - peak code : 702 
03834 - peak code : 701 
03835 - peak code : 2964 
03836 - peak code : 1337 
03837 - peak code : 3431 
03838 - peak code : 3456 
03839 - peak code : 3101 
03840 - peak code : 380 
03841 - peak code : 921 
03842 - peak code : 1322 
03843 - peak code : 3843 
03844 - peak code : 3167 
03845 - peak code : 2028 
03846 - peak code : 3205 
03847 - peak code : 426 
03848 - peak code : 2327 
03849 - peak code : 269 
03850 - peak code : 353 
03851 - peak code : 2092 
03852 - peak code : 1479 
03853 - peak code : 2233 
03854 - peak code : 4637 
03855 - peak code : 272 
03856 - peak code : 363 
03857 - peak code : 350 
03858 - peak code : 3342 
03859 - peak code : 1051 
03860 - peak code : 3014 
03861 - peak code : 2776 
03862 - peak code : 564 
03863 - peak code : 3175 
03864 - peak code : 2354 


04145 - peak code : 1400 
04146 - peak code : 3577 
04147 - peak code : 3366 
04148 - peak code : 2531 
04149 - peak code : 1759 
04150 - peak code : 3957 
04151 - peak code : 1304 
04152 - peak code : 3231 
04153 - peak code : 2496 
04154 - peak code : 243 
04155 - peak code : 1326 
04156 - peak code : 3206 
04157 - peak code : 1567 
04158 - peak code : 1011 
04159 - peak code : 3783 
04160 - peak code : 4501 
04161 - peak code : 763 
04162 - peak code : 1374 
04163 - peak code : 2402 
04164 - peak code : 2178 
04165 - peak code : 3922 
04166 - peak code : 3787 
04167 - peak code : 2836 
04168 - peak code : 2785 
04169 - peak code : 2302 
04170 - peak code : 1998 
04171 - peak code : 2516 
04172 - peak code : 256 
04173 - peak code : 413 
04174 - peak code : 2138 
04175 - peak code : 2359 
04176 - peak code : 4390 
04177 - peak code : 3881 
04178 - peak code : 311 
04179 - peak code : 4273 
04180 - peak code : 4274 
04181 - peak code : 2747 
04182 - peak code : 561 
04183 - peak code 

# <span style="color: indigo">Adjusting data type and values</span> 

In the next block, we will adjust some of the data providing a more clear data

In [113]:
# create a copy of data frame in order to preserve the original data untouched
peaks_df = df.copy()

# arrange the columns names
peaks_df.columns = peaks_df.columns.str.replace(':', '')
peaks_df.columns = peaks_df.columns.str.replace(' ', '_')
peaks_df.columns = peaks_df.columns.str.replace('(', '')
peaks_df.columns = peaks_df.columns.str.replace(')', '')
keys = ['peak_code','name_of_peak','Elevation_feet','Elevation_meters','Continent','Country','State','Province',                      
        'Range/Region','Latitude','Longitude','Difficulty','Best_months_for_climbing',
        'Year_first_climbed','First_successful_climbers','Nearest_major_airport',
        'Convenient_Center','Volcanic_status','Most_recent_eruption']
peaks_df.index.name = "row_num"
peaks_df = peaks_df.reindex_axis(keys, axis=1)
peaks_df.columns = peaks_df.columns.str.lower()

>**elevation_feet/meters :**If both elevations are missing, then drop the peak record.

In [114]:
no_elevation_list = peaks_df[(peaks_df.elevation_feet .isnull()==True)&(peaks_df.elevation_meters.isnull()==True)].index.tolist()
peaks_df.drop(no_elevation_list,inplace=True)

>**elevation_feet/meters :** If only one of the elevations is given, then fill the missing data

In [115]:
f_t_m =0.3048 # meter = feet * 0.3048
peaks_df.elevation_feet = peaks_df.elevation_feet.str.replace(',', '')
peaks_df.elevation_meters = peaks_df.elevation_meters.str.replace(',', '')
peaks_df.elevation_feet = pd.to_numeric(peaks_df.elevation_feet,downcast='float', errors='coerce')
peaks_df.elevation_meters = pd.to_numeric(peaks_df.elevation_meters,downcast='float', errors='coerce')

In [116]:
null_meters = peaks_df.elevation_meters.isnull()
# 1st option using .loc
peaks_df.loc[null_meters, 'elevation_meters']=peaks_df.loc[null_meters, 'elevation_feet']*f_t_m
# 2nd option using where
# peaks_df.elevation_meters = peaks_df.elevation_feet.where(peaks_df.elevation_meters.isnull(),other = peaks_df.elevation_meters/f_t_m)*f_t_m
# peaks_df[['elevation_meters','elevation_feet']] = peaks_df[['elevation_meters','elevation_feet']].applymap(lambda x: '%.1f' % x)

>**country :** If a peak is listed with more than a single country, then use the first country.

In [117]:
peaks_df.country = peaks_df.country.str.split('/').str[0]
# peaks_df.country = peaks_df.country.map(lambda x: np.nan if x=='None' else x)

>**year_first_climbed :** if the date of the first climbing is recorded as an irregular date, then try to manipulate it so that only the year data is preserved.  
If the data is too obscure, then put None instead

In [118]:
# remove Nulls
null_first_climbed = peaks_df.year_first_climbed.loc[peaks_df.year_first_climbed.str.contains("Don't know|Unkown|Unclimbed|No Info|n/a|---|Unknow|Long|Hell of a long time ago|Na|Prehistoric|pre-history|unknon|no data|pre-written history|no idea|prehistoric|Not Sure|Ancient|Roman Times|Unkonwn|several|BC|B.C", case = False,na = False)==True]
peaks_df.loc[null_first_climbed.index, 'year_first_climbed'] = np.nan

# turn 197- or 197? to 1970 but not ?197 to 0197
peaks_df.year_first_climbed = peaks_df.year_first_climbed.str.replace(r'([0-9]{3,4})[\?\-]', r'\g<1>0')
# In addition to character escapes and backreferences as described above,
# \g will use the substring matched by the group named name, as defined by the (?P...) syntax. 
# \g uses the corresponding group number; \g<2> is therefore equivalent to \2, 
# but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, 
# not a reference to group 2 followed by the literal character '0'. 
# The backreference \g<0> substitutes in the entire substring matched by the RE.

In [119]:
regex = re.compile('([0-9]{3,4})')
peaks_df.year_first_climbed = pd.to_numeric(peaks_df.year_first_climbed.str.extract(regex, expand=False),downcast='integer', errors='coerce')

In [120]:
# changing 1916.0 to 1916 will turn type from float32 to object (string)
peaks_df.year_first_climbed = peaks_df.year_first_climbed.map(lambda x: str(x)[:-2] if ~np.isnan(x) else x)

In [121]:
# peaks_df.year_first_climbed = pd.to_datetime(peaks_df.year_first_climbed,format = '%Y', errors = 'raise')

>Update other columns type

In [122]:
peaks_df.latitude = pd.to_numeric(peaks_df.latitude,downcast='float', errors='coerce')
peaks_df.longitude = pd.to_numeric(peaks_df.longitude,downcast='float', errors='coerce')
peaks_df.country = peaks_df.country.astype(str)

In [123]:
peaks_df.country = peaks_df.country.map(lambda x:np.nan if x=='None' else x)

In [124]:
peaks_df.difficulty = peaks_df.difficulty.str.replace('Walk up', 'Walk Up')
peaks_df.difficulty = peaks_df.difficulty.str.replace('Basic Snow/Ice climb', 'Basic Snow/Ice Climb')

### <span style="color: indigo">Exporting to CSV file called "peaks.csv"</span> 

In [125]:
peaks_df.to_csv('peaks.csv', encoding='utf-8') 