### Packages

In [21]:
using HTTP, DataFrames, CSV
# HTTP for getting the url
# DataFrames for using rename!()
# CSV for extracting data csv format and pass it (under dataframe) to julia

### Function to get data from API link

In [22]:
# Source: https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/
#         https://juliadata.github.io/CSV.jl/stable/index.html
#         https://discourse.julialang.org/t/replacing-missing-really/7952/8
#         https://juliaweb.github.io/HTTP.jl/v0.6/
#         https://github.com/JuliaData/CSV.jl/blob/3007529a8fcddf6553c46d10ed880ec1ced60e22/src/file.jl#L92-L173
#         https://datatofish.com/import-csv-julia/

# Getting csv file by passing in key words, mode (tone, volume, ...), range of time
function get_csv(keyword, mode, country, starttime, endtime)
    res = HTTP.get(
        "https://api.gdeltproject.org/api/v2/doc/doc?";
        query = [
            "format" => "csv"
            "query" => "$keyword"
            "mode" => "$mode"
            "STARTDATETIME" => "$starttime"
            "ENDDATETIME"=> "$endtime"
            "query" => "sourcecountry:$country"
        ]
    )
    df = CSV.read(res.body, normalizenames=true)
    return df
end

get_csv (generic function with 1 method)

It seems to be highly similar to the function in R part. However, the biggest difference is that, I can add as many queries into the query list as I want in julia without worrying about it would be ignored.

### Function to pass dataframes into variables, write into real csv files, and store them in a folder

In [23]:
#Source: https://en.wikibooks.org/wiki/Introducing_Julia/DataFrames
#        http://sashankexpresstech.blogspot.com/2017/09/julia-language-dataframes-renaming.html
#        https://docs.julialang.org/en/v1/base/file/#Base.Filesystem.mkdir


# Creating a function to pass dataframes into variables and write into csv files 
# It includes (average tone, volume intensity) from 2018 to 2020 in US
function produce_csv(keyword, path)

    #Creating a folder
    new_name=keyword*" part1"
    mkdir(new_name)
    cd("$path/$new_name")
    
    #Getting csv for AVERAGE TONE
    average_tone = get_csv(keyword, "timelinetone", "US", "20180101000000", "20201009230000")
    rename!(average_tone, Dict(:Value => :Average_tone))
    select!(average_tone, Not(:2))
    CSV.write("average_tone.csv", average_tone)
    
    
    #Volume Intensity
    volume_intensity = get_csv(keyword, "timelinevolinfo", "US", "20180101000000", "20201009230000")
    rename!(volume_intensity, Dict(:Value => :Volume_intensity))
    select!(volume_intensity, Not(:2))
    CSV.write("volume_intensity.csv", volume_intensity)
    
    #Still return a list of above dataframes (even though saved into csv already) in case of checking
    return [average_tone, volume_intensity]
    
end

produce_csv (generic function with 1 method)

This part and the part below were also highly similar to R part. However, in this case, we added some code (rename!()) to rename the column name, and some code (select!()) to drop some unneccesary column. After the code being run, the missing values would also be checked.

### Callling the functions

In [25]:
# Getting the directory to produce the csv file
cd("/Users/thongnguyen/Desktop/UCcourses/DATA422-Data Wrangling/Group project/Project data")

# Getting all csv dataframes (average tone, and volume intensity) - keyword (Mass shooting)
mass_shooting = produce_csv("mass shooting", pwd())

2-element Array{DataFrame,1}:
 1013×2 DataFrame
│ Row  │ Date       │ Average_tone │
│      │ [90mDates.Date[39m │ [90mFloat64[39m      │
├──────┼────────────┼──────────────┤
│ 1    │ 2018-01-01 │ -1.3387      │
│ 2    │ 2018-01-02 │ -1.1674      │
│ 3    │ 2018-01-03 │ -1.1686      │
│ 4    │ 2018-01-04 │ -1.2542      │
│ 5    │ 2018-01-05 │ -1.3367      │
│ 6    │ 2018-01-06 │ -1.1259      │
│ 7    │ 2018-01-07 │ -0.8556      │
│ 8    │ 2018-01-08 │ -0.5482      │
│ 9    │ 2018-01-09 │ -0.9509      │
│ 10   │ 2018-01-10 │ -0.8889      │
⋮
│ 1003 │ 2020-09-29 │ -0.8743      │
│ 1004 │ 2020-09-30 │ -1.008       │
│ 1005 │ 2020-10-01 │ -0.9014      │
│ 1006 │ 2020-10-02 │ -1.198       │
│ 1007 │ 2020-10-03 │ -1.3302      │
│ 1008 │ 2020-10-04 │ -1.2722      │
│ 1009 │ 2020-10-05 │ -0.6732      │
│ 1010 │ 2020-10-06 │ -0.9329      │
│ 1011 │ 2020-10-07 │ -0.8615      │
│ 1012 │ 2020-10-08 │ -0.9183      │
│ 1013 │ 2020-10-09 │ -1.2604      │
 1013×22 DataFrame. Omitted printing of 20

In [26]:
# Checking missing values
ismissing(mass_shooting)

false

False means there was no missing values in the dataset

In [27]:
mass_shooting

2-element Array{DataFrame,1}:
 1013×2 DataFrame
│ Row  │ Date       │ Average_tone │
│      │ [90mDates.Date[39m │ [90mFloat64[39m      │
├──────┼────────────┼──────────────┤
│ 1    │ 2018-01-01 │ -1.3387      │
│ 2    │ 2018-01-02 │ -1.1674      │
│ 3    │ 2018-01-03 │ -1.1686      │
│ 4    │ 2018-01-04 │ -1.2542      │
│ 5    │ 2018-01-05 │ -1.3367      │
│ 6    │ 2018-01-06 │ -1.1259      │
│ 7    │ 2018-01-07 │ -0.8556      │
│ 8    │ 2018-01-08 │ -0.5482      │
│ 9    │ 2018-01-09 │ -0.9509      │
│ 10   │ 2018-01-10 │ -0.8889      │
⋮
│ 1003 │ 2020-09-29 │ -0.8743      │
│ 1004 │ 2020-09-30 │ -1.008       │
│ 1005 │ 2020-10-01 │ -0.9014      │
│ 1006 │ 2020-10-02 │ -1.198       │
│ 1007 │ 2020-10-03 │ -1.3302      │
│ 1008 │ 2020-10-04 │ -1.2722      │
│ 1009 │ 2020-10-05 │ -0.6732      │
│ 1010 │ 2020-10-06 │ -0.9329      │
│ 1011 │ 2020-10-07 │ -0.8615      │
│ 1012 │ 2020-10-08 │ -0.9183      │
│ 1013 │ 2020-10-09 │ -1.2604      │
 1013×22 DataFrame. Omitted printing of 20