# UFO Question 

Our data science team has predicted that the Earth is going to be invaded by an alien force in the
next years. Our only hope is to replicate a device that can block all alien technology in a radius of
~300km. Sadly, the device was sold in 2004 to an anonymous buyer to protect her hometown and
we don't know how contact her again. We know that the device has been active since 2004 in one
city in the USA, and we want to know where to start our search.
We've included a dataset called ​ufo.csv​. This dataset contains over 80,000 reports of UFO sightings
over the last century (all of them verified by the ESA). Using this dataset, try to guess the city in
which the device has been hidden.


In [61]:
# Import libraries
from pymongo import MongoClient
import pandas as pd
import folium
from folium import plugins

In [62]:
# Import data
client=MongoClient('mongodb://localhost:27017')
db = client.AlientDB
colec = db.UFO
db.list_collection_names() #check

['UFO', 'Location']

In [63]:
# Select data # We must select the data where year is greater than 2004 and within the USA,
df_ufo = pd.DataFrame(colec.find({'country': "us", 'year':{'$regex':'^200[5-9]$|^201\d'}},
            {'_id':0, 'latitude':1, 'longitude':1, 'year':1}).sort('year')) # we already generated the dataframe to work it
df_ufo.sample()


Unnamed: 0,latitude,longitude,year
31322,44.8547222,-93.4705556,2012


In [64]:
# Find the city
# First of all we are gonna generate a new column with the point info, we are going to create a function in order to do so,
def get_first(latitude, longitude):    
    Location=None 
    if latitude and longitude:
        # esto ya es una geoquery (geopoint)
        Location={
            'type':'Point',
            'coordinates':[longitude,
                            latitude]
        }
        
    return Location

In [65]:
# We apply the function to the data frame and generate the new column,

df_ufo['Location'] = df_ufo.apply(lambda x: get_first(x.latitude, x.longitude), axis = 1).dropna() 

print(dict(df_ufo['Location'].sample(1)))
df_ufo.sample(5)

{24664: {'type': 'Point', 'coordinates': ['-117.0855556', '33.1191667']}}


Unnamed: 0,latitude,longitude,year,Location
21101,42.6405556,-84.5152778,2010,"{'type': 'Point', 'coordinates': ['-84.5152778..."
37467,42.5583333,-70.8805556,2013,"{'type': 'Point', 'coordinates': ['-70.8805556..."
11847,41.2958333,-86.625,2008,"{'type': 'Point', 'coordinates': ['-86.625', '..."
7066,34.2977778,-83.8241667,2006,"{'type': 'Point', 'coordinates': ['-83.8241667..."
27478,41.6269444,-88.2038889,2011,"{'type': 'Point', 'coordinates': ['-88.2038889..."


In [66]:
# Let's safe the data in a json file, this step is not necessary but it is usefull
df_ufo.to_json('UFOinUsSince2004.json')    # Generate a json
db.Location.insert_many(df_ufo.to_dict('records'))  # Insert it into the mongo database


<pymongo.results.InsertManyResult at 0x7f0f1d324cc0>

In [123]:
mapa3=folium.Map([43, -100], zoom_start=3.5)

data=df_ufo[['latitude', 'longitude']]

mapa3.add_child(plugins.HeatMap(data, radius=15))

#mapa3.save('images/heatmap.html')

In [124]:
# Let's make a plot around nashville, as it seems a good candidate for our search
mapa4=folium.Map([36.17, -86.76], zoom_start=6)
mapa4.add_child(plugins.HeatMap(data, radius=15))

# Let's safe the obtained images,
mapa3.save('AllUs.html')
mapa4.save('Nashville.html')


In [125]:
# Let's try a different plot, now. It will plot all the data but accordingly with time
# (I am going to do it simply with years but we could do days if we had time)
df_year_list = []
df_ufo['count']= 1 # we create a new column in order to count how many times was an object detected each year
for year in df_ufo.year.sort_values().unique():
    df_year_list.append(df_ufo.loc[df_ufo.year == year, 
                                    ['latitude', 
                                     'longitude', 
                                     'count']].groupby(['latitude', 
                                                        'longitude']).sum().reset_index().values.tolist())


def generateBaseMap(default_location=[36.17, -86.76], default_zoom_start=5):
    
    base_map = folium.Map(location=default_location, 
                          control_scale=False, 
                          tiles='stamentoner',
                          zoom_start=default_zoom_start)
    
    return base_map

base_map = generateBaseMap() # Let's create the map

In [133]:
base_map = generateBaseMap() # Let's create the map
HMWT = plugins.HeatMapWithTime(df_year_list, 
     radius=5, 
     gradient={0.1: 'blue', 0.3: 'lime', 0.5: 'yellow', 0.7: 'orange', 1: 'red'},
     min_opacity=1, 
     max_opacity=1, 
     auto_play=True,
     use_local_extrema=False
     )

base_map.add_child(HMWT)
  # show plot

**Looks like the city we are looking for is Nashville**