## Download with JSON format to analysis and visulaization

> In this section, we’ll download a data set by Python <b>Requests/Selenium</b>. Because the data is stored in the JSON format, we’ll work with it using the json module. Using <b>Plotly</b>’s beginner-friendly mapping tool for data, we’ll create visualizations that clearly show the interesting things. Note: Many geospatial frameworks list the longitude first and then the latitude, because this corresponds to the (x, y) convention we use in mathematical representations. The geoJSON format follows the (longitude, latitude) convention.

In [1]:
# coding:utf-8

import requests
import json
import os
from lxml import etree
from selenium import webdriver

query = 'Coffee'
downloadPath = './testimages/'
        
# Download Pics
def download(src, id):
    dir = downloadPath + str(id) + '.jpg'
    try:
        pic = requests.get(src, timeout=10)
    except requests.exceptions.ConnectionError:
        # print 'error, %d current pictures can not be downloaded, %id
        print('Pics can not download')
    if not os.path.isdir(downloadPath):  # if not os.path.exists(downloadPath)
        os.mkdir(downloadPath)
    if os.path.exists(dir):
        print('Existed'+id)
        return
    fp = open(dir,'wb')
    fp.write(pic.content)
    fp.close()  
    
def searchImages():
    # for loop to search url
    for i in range(0, 900, 20):
        url = 'https://www.douban.com/j/search_photo?q='+query+'&limit=20&start='+str(i)
        html = requests.get(url).text # Requrest result
        print('html:'+html)
        response = json.loads(html,encoding='utf-8') # To JSON format Python readable object
        for image in response['images']:
            print(image['src']) # Check the current image
            download(image['src'], image['id']) # Download next image
            
def getMovieImages():
    url = 'https://movie.douban.com/subject_search?search_text='+ query +'&cat=1002'
    driver = webdriver.Chrome('/Users/injoy/mydev/python/xpath/chromedriver')
    driver.get(url)
    html = etree.HTML(driver.page_source)
    # To use xpath helper, ctrl+shit+x to select elements if matched all revised query
    src_xpath = "//div[@class='item-root']/a[@class='cover-link']/img[@class='cover']/@src"
    title_xpath = "//div[@class='item-root']/div[@class='detail']/div[@class='title']/a[@class='title-text']"
    
    srcs = html.xpath(src_xpath)
    titles = html.xpath(title_xpath)
    for src, title in zip(srcs, titles):
        print('\t'.join([str(src),str(title.text)]))
        download(src, title.text)
    driver.close()

getMovieImages()

https://img3.doubanio.com/view/celebrity/s_ratio_celebrity/public/p20703.webp	吕晶晶 Ching-ching Lui
https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2349265470.webp	咖啡-改变美国的饮料 Coffee: The Drink That Changed America‎ (2016)
https://img9.doubanio.com/view/photo/s_ratio_poster/public/p958678356.webp	咖啡 Coffee‎ (2004)
https://img3.doubanio.com/view/subject/l/public/s25799961.jpg	现代奇迹系列之咖啡 Modern Marvels: Coffee‎ (2005)
https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1829212604.webp	上帝不是咖啡色 Coffee‎ (2003)
https://img1.doubanio.com/view/subject/l/public/s4070809.jpg	Coffee‎ (2004)
https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2198936206.webp	Café‎ (2014)
https://img9.doubanio.com/view/subject/l/public/s25339736.jpg	咖啡 Koffie‎ (2012)
https://img3.doubanio.com/f/movie/30c6263b6db26d055cbbe73fe653e29014142ea3/pics/movie/movie_default_large.png	Koffie‎ (2012)
https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2548783123.webp	走进工厂：咖啡 Inside the Factory:

### JSON foramt earthquakes data analysis
>  Downloaded a data set representing all the earthquakes with json format that have occurred in the world during the previous month. Then you’ll make a map showing the location of these earthquakes and how significant each one was. Because the data is stored in the JSON format, we’ll work with it using the json module. Using Plotly’s beginner-friendly mapping tool for location-based data, you’ll create visualizations that clearly show the global distribution of earthquakes. The json module provides a variety of tools for exploring and working with JSON data. Some of these tools will help us reformat the file so we can look at the raw data more easily before we begin to work with it programmatically. The <code>json.load()</code> function converts the data into a format Python can work with: in this case, a giant dictionary. At we create a file to write this same data into a
more readable format. The <code>json.dump()</code> function takes a JSON data object and a file object, and writes the data to that file. The indent=4 argument tells dump() to format the data using indentation that matches the data’s structure. This <b>geoJSON</b> file has a structure that’s helpful for location-based data. The information is stored in a list associated with the key "features".

In [2]:
import json

from plotly.graph_objs import Scattergeo, Layout
from plotly import offline

# Explore the structure of the data.
filename = 'data/JSON/eq_data_30_day_m1.json'
with open(filename) as f:
    all_eq_data = json.load(f)

all_eq_dicts = all_eq_data['features']

mags, lons, lats, hover_texts = [], [], [], []
for eq_dict in all_eq_dicts:
    mag = eq_dict['properties']['mag']
    lon = eq_dict['geometry']['coordinates'][0]
    lat = eq_dict['geometry']['coordinates'][1]
    title = eq_dict['properties']['title']
    mags.append(mag)
    lons.append(lon)
    lats.append(lat)
    hover_texts.append(title)

# Map the earthquakes.
data = [{
    'type': 'scattergeo',
    'lon': lons,
    'lat': lats,
    'text': hover_texts,
    'marker': {
        'size': [5*mag for mag in mags],
        'color': mags,
        'colorscale': 'Viridis',
        'reversescale': True,
        'colorbar': {'title': 'Magnitude'},
    },
}]

my_layout = Layout(title='Global Earthquakes')

fig = {'data': data, 'layout': my_layout}
offline.plot(fig, filename='data/global_earthquakes.html')


'global_earthquakes.html'