## 국회의원 현황정보 수집/분석/시각화/저장
* 이름과 ID 추출
* 상세페이지 정보 추출 (json으로 저장)
* 상세정보들을 DataFrame으로 변환
* 시각화(막대그래프,히스토그램, 파이챠트, 히트맵)
* 테이블로 저장

In [1]:
import requests
from bs4 import BeautifulSoup
import re


In [6]:
url = 'https://www.assembly.go.kr/assm/memact/congressman/memCond/memCondListAjax.do'

req_param_dict={
    'currentPage' : 1,
    'rowPerPage' : 300
}

res = requests.get(url, params=req_param_dict)

print(res.status_code)

if res.ok:
    soup=BeautifulSoup(res.text, 'html.parser')
    atag_list = soup.select('div.memberna_list dl dt a')
    #print(len(atag_list))
    
    member_id_list = []
    for atag in atag_list:
        href = atag['href']
        matched = re.search(r'(\d+)', href)        
        if matched:
            member_id = matched.group(0)
        member_id_list.append(member_id)
    print(len(member_id_list))        
    print(member_id_list[:3])

200
295
['9771230', '9771142', '9771174']


In [8]:
import requests
from bs4 import BeautifulSoup
import re
from urllib.parse import urljoin

print('===> 스크래핑 시작')
member_detail_list = []
for idx,mem_id in enumerate(member_id_list,1):
    detail_url = f'https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd={mem_id}'
    print(idx, detail_url)
    res = requests.get(detail_url)
    if res.ok:
        soup = BeautifulSoup(res.text, 'html.parser')
        
        #1명의 정보를 저장할 dict 선언
        member_detail_dict = {}
        
        dt_list = [dt_tag.text for dt_tag in soup.select('dl.pro_detail dt')]
        
        dd_list = []        
        for dd_tag in soup.select('dl.pro_detail dd'):
            pattern = re.compile(f'[\n\r\t]')
            dd_text = pattern.sub('',dd_tag.text.strip()).replace(" ","")
            dd_list.append(dd_text)
        
        member_detail_dict = dict(zip(dt_list, dd_list))
        
        for div_tag in soup.select('div.profile'):
            member_detail_dict['이름']=div_tag.find('h4').text
            
            img_tag = div_tag.select('img')
            if img_tag:
                member_detail_dict['이미지'] = urljoin(detail_url, img_tag[0]['src'])
            
            member_detail_dict['생년월일'] = div_tag.select_one('li:nth-of-type(4)').text
        
        #1명의 정보가 저장된 dict를 list에 추가하기
        member_detail_list.append(member_detail_dict)

print(len(member_detail_list))        
print('===> 스크래핑 끝')        

===> 스크래핑 시작
1 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771230
2 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771142
3 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771174
4 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771233
5 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771283
6 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9770933
7 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771116
8 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771276
9 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771168
10 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771007
11 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771109
12 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771180
13 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9770931
14 https://www.assembly.go.kr/assm/memPop/memPopup.do?dept_cd=9771224
15 https://www.a

In [11]:
import json

with open('국회의원현황정보.json','w') as file:
    json.dump(member_detail_list, file)

In [15]:
import pandas as pd

member_df = pd.read_json('국회의원현황정보.json')
print(member_df.shape)

(295, 16)


In [16]:
member_df.head(5)

Unnamed: 0,정당,선거구,소속위원회,당선횟수,사무실 전화,사무실 호실,홈페이지,이메일,보좌관,비서관,비서,"취미, 특기",의원실 안내,이름,이미지,생년월일
0,국민의힘,경남창원시성산구,보건복지위원회,"재선(19대,21대)",02-784-1751,의원회관937호,http://blog.naver.com/ggotop,ggotop@naver.com,"김홍광,한영애","지상훈,최광림","김영록,안효상,이유진,홍지형,김지훈",,,강기윤,https://www.assembly.go.kr/photo/9771230.jpg,1960-06-04
1,국민의힘,대구동구을,"국방위원회,정치개혁특별위원회",초선(21대),02-784-5275,의원회관341호,,kds21341@naver.com,"박홍규,정운태","유진영,윤미라","박순권,김광연,김현정,송민욱",,,강대식,https://www.assembly.go.kr/photo/9771142.jpg,1959-11-02
2,더불어민주당,경기안양시만안구,"교육위원회,예산결산특별위원회",초선(21대),02-784-2747~9,의원회관440호,https://blog.naver.com/dulipapa,mainsail440@daum.net,"서용선,안홍식","최경순,홍미하","문형구,최기섭,조나연,오세령,배은경",,강득구의원의'사람중심민생중심'의정활동이국민의삶에힘이되도록열심히하고있습니다.강득구의원...,강득구,https://www.assembly.go.kr/photo/9771174.jpg,1963-05-27
3,국민의힘,경남진주시을,"국회운영위원회,정무위원회,중앙선거관리위원회위원(문상부)선출에관한인사청문특별위원회,정...",초선(21대),02-784-0797,의원회관1007호,,strongwind01@naver.com,"강민승,정경섭","국고은,오경훈","성환종,사정아,김오주,박정헌,한지은",,,강민국,https://www.assembly.go.kr/photo/9771233.jpg,1971-03-03
4,더불어민주당,비례대표,"국회운영위원회,교육위원회,예산결산특별위원회",초선(21대),02-784-2477,의원회관421호,https://blog.naver.com/kmgedu21,kmj2020edu@gmail.com,"손성조,윤호숙","김민혜,김원석","김수안,김성용,민지홍,황연미,양진영",,,강민정,https://www.assembly.go.kr/photo/9771283.jpg,1961-04-26
