Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual fix for missing events.2017 file. #57

Closed
andybega opened this issue May 6, 2020 · 5 comments
Closed

Manual fix for missing events.2017 file. #57

andybega opened this issue May 6, 2020 · 5 comments

Comments

@andybega
Copy link
Owner

andybega commented May 6, 2020

The 2017 events file is missing from the repo after the last update on 4/27. Keep it in local in case it's present.

@andybega
Copy link
Owner Author

andybega commented May 8, 2020

Keep this open in case a 2017 file shows up and this code can be taken out.

@andybega
Copy link
Owner Author

andybega commented Jun 9, 2020

It has been added, but with a non-standard file name:

Events.2017.20200602.tab.zip"

versus the standard:

events.2019.20200427085336.tab

@andybega
Copy link
Owner Author

andybega commented Jun 9, 2020

The file contents are also problematic:

Warning: 729752 parsing failures.
row        col   expected   actual                                             file
  1 Event Date date like  1/1/2017 '~/Work/data/icews/raw/Events.2017.20200602.tab'
  2 Event Date date like  1/1/2017 '~/Work/data/icews/raw/Events.2017.20200602.tab'
  3 Event Date date like  1/1/2017 '~/Work/data/icews/raw/Events.2017.20200602.tab'
  4 Event Date date like  1/1/2017 '~/Work/data/icews/raw/Events.2017.20200602.tab'
  5 Event Date date like  1/1/2017 '~/Work/data/icews/raw/Events.2017.20200602.tab'
... .......... .......... ........ ................................................
See problems(...) for more details.

Error: NOT NULL constraint failed: events.event_date

@andybega andybega reopened this Oct 2, 2020
@andybega
Copy link
Owner Author

andybega commented Oct 2, 2020

The fix for the date issue in cb75754 allows parsing, but actually the file still has an issue, the longitudes column values are all missing.

> foo = read_events_tsv(find_raw("Events.2017.20200602.tab"))
|======================================================================| 100%  181 MB
Warning: 46 parsing failures.
  row      col expected actual                                             file
 6660 Latitude a double   NULL '~/Work/data/icews/raw/Events.2017.20200602.tab'
31770 Latitude a double   NULL '~/Work/data/icews/raw/Events.2017.20200602.tab'
56746 Latitude a double   NULL '~/Work/data/icews/raw/Events.2017.20200602.tab'
58501 Latitude a double   NULL '~/Work/data/icews/raw/Events.2017.20200602.tab'
58502 Latitude a double   NULL '~/Work/data/icews/raw/Events.2017.20200602.tab'
..... ........ ........ ...... ................................................
See problems(...) for more details.

> dim(foo)
[1] 729706     20
> colnames(foo)
 [1] "event_id"        "event_date"      "source_name"     "source_sectors" 
 [5] "source_country"  "event_text"      "cameo_code"      "intensity"      
 [9] "target_name"     "target_sectors"  "target_country"  "story_id"       
[13] "sentence_number" "publisher"       "city"            "district"       
[17] "province"        "country"         "latitude"        "longitude"      
> head(foo$latitude)
[1] 39.0339 19.0728 -3.3822 39.9075 38.8951 38.8951
> head(foo$longitude)
[1] NA NA NA NA NA NA

This is something ICEWS should fix on dataverse. (TODO)

@andybega
Copy link
Owner Author

andybega commented Oct 8, 2020

There is a new version of Events.2017, (dated 6 October 2020), which now has the longitudes.

foo = read_events_tsv(find_raw("Events.2017.20201006.tab"))
sapply(foo, function(x) sum(is.na(x))) %>% enframe()
# A tibble: 19 x 2
   name            value
   <chr>           <int>
 1 event_id            0
 2 event_date          0
 3 source_name         0
 4 source_sectors      0
 5 source_country      0
 6 event_text          0
 7 intensity           0
 8 target_name         0
 9 target_sectors      0
10 target_country      0
11 story_id            0
12 sentence_number     0
13 publisher           0
14 city                0
15 district            0
16 province            0
17 country             0
18 latitude           46
19 longitude          46

The event dates are still non-standard ("m/d/Y") though, so the fix in cb75754 is still required. Otherwise this should be good to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant