The released dataset contains 39,772 geotagged tweets in English, which are related to earthquakes in Japan and have been collected with the Twitter Standard Streaming API. The tweets have been posted between March 1, 2020 and February 28, 2021 (one-year period) and contain the terms "earthquake" and "Japan" in their text.
The provided geoinformation is generated by an automatic geotagging methodology that transforms English tweets into georeferenced data by using their textual content to detect mentioned locations. After proper preprocessing, Named Entity Recognition (NER) techniques are employed in the form of a pre-trained Bidirectional Long Short-Term Memory (biLSTM)-based model to retrieve location-type mentions in the tweet’s text. Terms that are recognized as places (they can be single-word, e.g. “Tokyo”, or multi-word, e.g. “Sendai Airport”), are then associated to a geographical point (pair of coordinates) through a query to OpenStreetMap API.
Property Name | Property Type | Description |
---|---|---|
_id | String | The unique identifier of a tweet, as provided by Twitter. |
detected_locations | Array | An array of objects that contain information about the locations that have been extracted from a tweet’s text. |
location_in_text | String | The word(s) in a tweet’s text that has/have been recognized as locations after analysis. |
location_fullname | String | The full location name as retrieved by the OpenStreetMap API. |
geometry | JSON Object | A JSON object that contains information about the coordinates of the location. |
type | String (predefined value: "Point") |
A field that defines the type of the coordinates. |
coordinates | Array (format [latitude,longitude]) |
An array of Double values that refer to the latitude and longitude coordinates of the location, as retrieved by the OpenStreetMap API. |
This dataset is licensed under the Creative Commons Attribution-NonCommercial International Public License (CC BY-NC). When downloading tweets by the means of the distributed Tweet IDs, users have to be compliant with Twitter’s Developer Agreement and Policy. By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript.
Andreadis, S., Gialampoukidis, I., Manconi, A., Cordeiro, D., Conde, V., Sagona, M., Brito, F., Pantelidis, N., Mavropoulos, T., Grosso, N. and Vrochidis, S., 2022. Earthquakes: From Twitter Detection to EO Data Processing. IEEE Geoscience and Remote Sensing Letters, 19, pp.1-5.
BibTeX:
@article{andreadis2022earthquakes,
title={Earthquakes: From Twitter Detection to EO Data Processing},
author={Andreadis, Stelios and Gialampoukidis, Ilias and Manconi, Andrea and Cordeiro, David and Conde, Vasco and Sagona, Manuela and Brito, Fabrice and Pantelidis, Nick and Mavropoulos, Thanassis and Grosso, Nuno and others},
journal={IEEE Geoscience and Remote Sensing Letters},
volume={19},
pages={1--5},
year={2022},
publisher={IEEE}
}
If you have any further questions about the dataset or if you are interested in running some additional analyses on the data, please contact Stelios Andreadis at andreadisst@iti.gr.
Stelios Andreadis, Nick Pantelidis, Thanassis Mavropoulos, Ilias Gialampoukidis, Stefanos Vrochidis, Ioannis Kompatsiaris
Information Technologies Institute (ITI), Centre for Research and Technology Hellas (CERTH)