# Finding historical maps

## This notebook aims to:
    - [X] Find historical map files for brazil states
    - [X] Download them
    - [X] Find out which relates to which election

## Notes:
    - The links and IFrames here are valid as of today (2017-01-10) and may not be in the future

In [2]:
import os
import wget

In [3]:
from IPython.display import IFrame

## Mostre onde encontrar mapas históricos do Brasil

No site mapas.ibge @2017-01-10, esse link parecia promissor:

In [4]:
IFrame(
    'http://mapas.ibge.gov.br/politico-administrativo/2012-05-31-17-03-17.html',
    '100%', 400
)

Porém ao tentar baixar os arquivos, recebo "550 Failed to change directory." do servidor

Portanto, nesse link existe um pdf promissor:

In [6]:
IFrame(
    'http://www.ibge.gov.br/home/geociencias/geografia/default_evolucao.shtm',
    '100%', 400
)

O arquivo "Evolução da divisão territorial" contém o que procuro:

In [7]:
# wget.download(
#     'ftp://geoftp.ibge.gov.br/organizacao_do_territorio/estrutura_territorial/evolucao_da_divisao_territorial_do_brasil_1872_2010/evolucao_da_divisao_territorial_mapas.pdf',
#     '../data/'
# )

In [8]:
os.listdir('../data')

['evolucao_da_divisao_territorial_mapas.pdf',
 'maps_relation.json',
 'historical_maps',
 'maps_conversion',
 'deliver_json']

In [9]:
IFrame(
    '../data/evolucao_da_divisao_territorial_mapas.pdf',
    '100%', 400
)

Mas como o arquivo está em PDF, é difícil de mais pra trabalhar

Por fim, encontrei uma boa fonte: "https://earthworks.stanford.edu/"
que contém um ótimo banco de dados e sistema de busca:

https://earthworks.stanford.edu/?commit=Limit&q=state+brazil&range[solr_year_i][begin]=1940&range[solr_year_i][end]=2017&search_field=dummy_range#documents

Por algum motivo a IFrame não carrega corretamente, mas os arquivos de interesse são os sob os links no formato 'State Boundaries: Brasil, < ano >'

## Mostre a fonte final escolhida

Do site anterior extraio os links dos arquivos:

Preciso Fazer um novo mapa para 1956, quando GP passa a se chamar RO

Uma outra observação para o suposto mapa de 1960, corrigo para 1962, notando que o que deveria ser Rio Branco está como Roraima apesar de RR ter sido criada em 1962

In [10]:
maps = [
    {'year': 1940, 'url': 'http://stacks.stanford.edu/file/druid:dm443sj2820/data.zip'},
    {'year': 1950, 'url': 'http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip'},
    {'year': 1956, 'url': 'http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip'},
    {'year': 1962, 'url': 'http://stacks.stanford.edu/file/druid:zf548hv4473/data.zip'},
    {'year': 1970, 'url': 'http://stacks.stanford.edu/file/druid:kx233jf3889/data.zip'},
    {'year': 1980, 'url': 'http://stacks.stanford.edu/file/druid:nf449db8341/data.zip'},
    {'year': 1991, 'url': 'http://stacks.stanford.edu/file/druid:ys298mq8577/data.zip'}   
]

## Faça download dos mapas

In [9]:
base_folder = '../data/historical_maps'

In [10]:
os.mkdir(base_folder)
for map in maps:
    folder = f'{base_folder}/{map["year"]}'
    os.mkdir(folder)
    r = wget.download(map['url'], folder)
    print(r)

../data/historical_maps/1940/data.zip
../data/historical_maps/1950/data.zip
../data/historical_maps/1956/data.zip
../data/historical_maps/1962/data.zip
../data/historical_maps/1970/data.zip
../data/historical_maps/1980/data.zip
../data/historical_maps/1991/data.zip


In [11]:
os.listdir(base_folder)

['1980', '1950', '1940', '1970', '1962', '1991', '1956']

## Unzipe os arquivos

In [12]:
from glob import glob

In [13]:
from zipfile import ZipFile

for folder in glob(f'{base_folder}/*'):
    with ZipFile(f'{folder}/data.zip', 'r') as zip_file:
        zip_file.extractall(folder)

In [14]:
glob(f'{base_folder}/*/*')

['../data/historical_maps/1980/04_limiteestadual1980.prj',
 '../data/historical_maps/1980/data.zip',
 '../data/historical_maps/1980/04_limiteestadual1980.sbn',
 '../data/historical_maps/1980/04_limiteestadual1980.dbf',
 '../data/historical_maps/1980/04_limiteestadual1980-iso19110.xml',
 '../data/historical_maps/1980/04_limiteestadual1980.shp.xml',
 '../data/historical_maps/1980/04_limiteestadual1980.sbx',
 '../data/historical_maps/1980/04_limiteestadual1980-fgdc.xml',
 '../data/historical_maps/1980/04_limiteestadual1980-iso19139.xml',
 '../data/historical_maps/1980/04_limiteestadual1980.shx',
 '../data/historical_maps/1980/04_limiteestadual1980.shp',
 '../data/historical_maps/1950/04_limiteestadual1950-iso19139.xml',
 '../data/historical_maps/1950/04_limiteestadual1950-fgdc.xml',
 '../data/historical_maps/1950/data.zip',
 '../data/historical_maps/1950/04_limiteestadual1950.shp',
 '../data/historical_maps/1950/04_limiteestadual1950.shx',
 '../data/historical_maps/1950/04_limiteestadual1

## Liste os anos de eleições para presidente

In [11]:
IFrame(
    'https://pt.wikipedia.org/wiki/Lista_de_elei%C3%A7%C3%B5es_presidenciais_no_Brasil#toc',
    '100%', 400
)

In [16]:
republica_velha = [
    1891, 1894, 1898, 1902, 1906,
    1910, 1914, 1918, 1919, 1922,
    1926, 1930
]

era_vargas = [
    1934
]

republica_nova = [
    1945, 1950, 1955, 1960
]

regime_militar = [
    1964, 1966, 1969, 1974, 1978
]

nova_republica = [
    1985, 1989, 1994, 1998, 2002,
    2006, 2010, 2014
]

In [17]:
election_years_all = (
    republica_velha + era_vargas + republica_nova +
    regime_militar  + nova_republica
)

In [18]:
len(election_years_all)

30

## Relacione qual mapa deve ser usado pra qual eleição

Como eu só possuo os mapas à partir de 1940, só relacionarei as eleições a partir da republica nova (1945)

Usarei o arquivo PDF citado acima '../data/evolucao_da_divisao_territorial_mapas.pdf' e terei que assumir que ele possui todas as mudanças territoriais relevantes.

Com isso, criarei ranges em que o mapa se parmenece, supostamente, inalterado

Com detalhe para 1956 que criei e 1960 que modifiquei para 1962, como observado acima na seção dos downloads

In [19]:
years_listed = [1872, 1900, 1911, 1920, 1933, 1940, 1950, 1956, 1962, 1970, 1980, 1991, 2000, 2010]

ranges = [range(start, end) for start, end in zip(years_listed[:-1], years_listed[1:])]
ranges += [range(years_listed[-1], 2017 + 1)]

In [20]:
ranges

[range(1872, 1900),
 range(1900, 1911),
 range(1911, 1920),
 range(1920, 1933),
 range(1933, 1940),
 range(1940, 1950),
 range(1950, 1956),
 range(1956, 1962),
 range(1962, 1970),
 range(1970, 1980),
 range(1980, 1991),
 range(1991, 2000),
 range(2000, 2010),
 range(2010, 2018)]

In [21]:
map_url = [{'url': map['url'], 'map_year': map['year']} for map in maps]
map_url

[{'map_year': 1940,
  'url': 'http://stacks.stanford.edu/file/druid:dm443sj2820/data.zip'},
 {'map_year': 1950,
  'url': 'http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip'},
 {'map_year': 1956,
  'url': 'http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip'},
 {'map_year': 1962,
  'url': 'http://stacks.stanford.edu/file/druid:zf548hv4473/data.zip'},
 {'map_year': 1970,
  'url': 'http://stacks.stanford.edu/file/druid:kx233jf3889/data.zip'},
 {'map_year': 1980,
  'url': 'http://stacks.stanford.edu/file/druid:nf449db8341/data.zip'},
 {'map_year': 1991,
  'url': 'http://stacks.stanford.edu/file/druid:ys298mq8577/data.zip'}]

In [22]:
# associe os anos de eleiçoes com o periodo:
election_period = []
for year in election_years_all:
    for year_range in ranges:
        if year in year_range:
            election_period.append({'election_year': year, 'period': year_range})

In [23]:
election_period

[{'election_year': 1891, 'period': range(1872, 1900)},
 {'election_year': 1894, 'period': range(1872, 1900)},
 {'election_year': 1898, 'period': range(1872, 1900)},
 {'election_year': 1902, 'period': range(1900, 1911)},
 {'election_year': 1906, 'period': range(1900, 1911)},
 {'election_year': 1910, 'period': range(1900, 1911)},
 {'election_year': 1914, 'period': range(1911, 1920)},
 {'election_year': 1918, 'period': range(1911, 1920)},
 {'election_year': 1919, 'period': range(1911, 1920)},
 {'election_year': 1922, 'period': range(1920, 1933)},
 {'election_year': 1926, 'period': range(1920, 1933)},
 {'election_year': 1930, 'period': range(1920, 1933)},
 {'election_year': 1934, 'period': range(1933, 1940)},
 {'election_year': 1945, 'period': range(1940, 1950)},
 {'election_year': 1950, 'period': range(1950, 1956)},
 {'election_year': 1955, 'period': range(1950, 1956)},
 {'election_year': 1960, 'period': range(1956, 1962)},
 {'election_year': 1964, 'period': range(1962, 1970)},
 {'electio

In [24]:
final_relation = []
for election in election_period:
    for map in map_url:
        if map['map_year'] in election['period']:
            final_relation.append({**map, **election})

Essas são as relações confiáveis (assumindo as premissas já citadas):

In [25]:
final_relation

[{'election_year': 1945,
  'map_year': 1940,
  'period': range(1940, 1950),
  'url': 'http://stacks.stanford.edu/file/druid:dm443sj2820/data.zip'},
 {'election_year': 1950,
  'map_year': 1950,
  'period': range(1950, 1956),
  'url': 'http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip'},
 {'election_year': 1955,
  'map_year': 1950,
  'period': range(1950, 1956),
  'url': 'http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip'},
 {'election_year': 1960,
  'map_year': 1956,
  'period': range(1956, 1962),
  'url': 'http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip'},
 {'election_year': 1964,
  'map_year': 1962,
  'period': range(1962, 1970),
  'url': 'http://stacks.stanford.edu/file/druid:zf548hv4473/data.zip'},
 {'election_year': 1966,
  'map_year': 1962,
  'period': range(1962, 1970),
  'url': 'http://stacks.stanford.edu/file/druid:zf548hv4473/data.zip'},
 {'election_year': 1969,
  'map_year': 1962,
  'period': range(1962, 1970),
  'url': 'http://stacks.stanford.edu

Mas apesar disso vou usar o último mapa (de 1991) para todas as eleições desde então, já que empiricamente eu sei que as mudanças são irrelevantes pra minha aplicação:

In [26]:
final_relation = []
for election in election_period:
    if election['election_year'] > 1998:
        final_relation.append({**map_url[-1], **election})
    for map in map_url:
        if map['map_year'] in election['period']:
            final_relation.append({**map, **election})

In [27]:
final_relation

[{'election_year': 1945,
  'map_year': 1940,
  'period': range(1940, 1950),
  'url': 'http://stacks.stanford.edu/file/druid:dm443sj2820/data.zip'},
 {'election_year': 1950,
  'map_year': 1950,
  'period': range(1950, 1956),
  'url': 'http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip'},
 {'election_year': 1955,
  'map_year': 1950,
  'period': range(1950, 1956),
  'url': 'http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip'},
 {'election_year': 1960,
  'map_year': 1956,
  'period': range(1956, 1962),
  'url': 'http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip'},
 {'election_year': 1964,
  'map_year': 1962,
  'period': range(1962, 1970),
  'url': 'http://stacks.stanford.edu/file/druid:zf548hv4473/data.zip'},
 {'election_year': 1966,
  'map_year': 1962,
  'period': range(1962, 1970),
  'url': 'http://stacks.stanford.edu/file/druid:zf548hv4473/data.zip'},
 {'election_year': 1969,
  'map_year': 1962,
  'period': range(1962, 1970),
  'url': 'http://stacks.stanford.edu

## Exporte as relações para JSON

In [28]:
import json

In [29]:
relation_copy = [{**rel} for rel in final_relation]
for relation in relation_copy:
    r = relation['period']
    relation['period'] = [r.start, r.stop]

In [30]:
with open('../data/maps_relation.json', 'w') as f:
    f.write(json.dumps(relation_copy))

In [31]:
os.listdir('../data/')

['evolucao_da_divisao_territorial_mapas.pdf',
 'historical_maps',
 'maps_relation.json']

In [32]:
%%bash
cat ../data/maps_relation.json

[{"url": "http://stacks.stanford.edu/file/druid:dm443sj2820/data.zip", "map_year": 1940, "election_year": 1945, "period": [1940, 1950]}, {"url": "http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip", "map_year": 1950, "election_year": 1950, "period": [1950, 1956]}, {"url": "http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip", "map_year": 1950, "election_year": 1955, "period": [1950, 1956]}, {"url": "http://stacks.stanford.edu/file/druid:yw498rc4263/data.zip", "map_year": 1956, "election_year": 1960, "period": [1956, 1962]}, {"url": "http://stacks.stanford.edu/file/druid:zf548hv4473/data.zip", "map_year": 1962, "election_year": 1964, "period": [1962, 1970]}, {"url": "http://stacks.stanford.edu/file/druid:zf548hv4473/data.zip", "map_year": 1962, "election_year": 1966, "period": [1962, 1970]}, {"url": "http://stacks.stanford.edu/file/druid:zf548hv4473/data.zip", "map_year": 1962, "election_year": 1969, "period": [1962, 1970]}, {"url": "http://stacks.stanford.edu/file/druid: