![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banner_Top_06.06.18.jpg?raw=true)

# Introduction to Open Data




## What is "Open Data"?
Open data is data that is freely available to use, manipulate, change, redistribute and analyze without restriction. Open data can be easily accessed by anyone, anywhere, anytime provided they possess the means to access that data via the Internet.  There is mounting pressure around the world for both governments and private entities to make more data available for analysis (and therefore scrutiny) by the public. The idea behind these initiatives is that to will allow for greater transparency from governments and businesses with the general public. As more data is becoming available, learning how to find and work with that data is becoming ever more important, and in our "Open Data 101" tutorial series, we'll introduce you to some of the basic tools and techniques you can use in a Jupyter notebook to load, analyze and interpret the results of your analysis of open data. 


## Are Some Datasets Better Than Others? 

Objectively speaking, the simple answer to this is 'yes'. Regardless of the question you're trying to answer or the analysis you're trying to do - just because data is open, doesn't necessarily mean that it is going to be easy to work with! Here's a few general rules to look for to make sure the data set you've chosen will be easy to manipulate and work with. 

1. If the data is in multiple files, is it consistently labeled/easy to relate quantities between files?  

1. Is the data in a format that's easy to load on your computer such as `csv`, `txt`, `json`, or Excel?   
 1. A kindly formatted website HTML table is also an excellent format!  

1. If you're working with spacial data, are GPS coordinates already included (where relevant)?   
1. Is the dataset complete for your problem? If values are missing, is there a good reason for them to be missing and are they very clearly labelled as missing?


For certain projects the maintenance of a data set (is it updated as new data become available?) is important. Typically there will be some mention of if the data set is maintained on the download page, or at the very least a time stamp of the last time the dataset was updated. 


### Reliable Sources
Another problem with open data is some data sets may be considered to be dubious, or constructed in a way that is intentionally misleading or simply incorrect. 
Much like any other source of information, a data set may also be affected by any biases the curator of the data may have (as well as your own!). Because of this, it is important to try and find an unbiased source of data, or at the very least multiple sources of data which may have alternate biases. For example, if you're looking to find data in order to study the heath affects of tobacco, a dataset provided by a study which was funded by a tobacco company may have entirely different results than one which was done independently. As this is the case for all data sets, it is important to be conscious of your data's origins before you draw any major conclusions from your analysis. 




---
## What Kind of Open Data Sets Are There, and Where Can I Find Them?

There are all sorts of open data sets available online of almost anything you can think of. The key to finding this data is to just know where to look, and to that end we have provided a few links to open data portals to get you started in your search for the perfect data set.


### Government Data
#### Canadian Data
1. [Statistics Canada](https://www150.statcan.gc.ca/n1/en/type/data?MM=1)
1. [Canada Open Data](https://open.canada.ca/en) (Contains many data sets for provinces or territories without their own open data portal)
1. [Alberta Open Data](https://open.alberta.ca/opendata)
1. [BC Open Data](https://data.gov.bc.ca/)
1. [Saskatchewan Open Data](http://www.opendatask.ca/)
1. [Northwest Territories Open Data](https://www.opennwt.ca/)
1. [Ontario Open Data](https://www.ontario.ca/search/data-catalogue)
1. [Quebec Open Data](http://donnees.ville.quebec.qc.ca/catalogue.aspx) (French only)
1. [Nova Scotia Open Data](https://data.novascotia.ca/)
1. [PEI Open Data](https://data.princeedwardisland.ca/)
1. [Calgary Open Data Portal](https://data.calgary.ca/)
1. [Edmonton Open Data Portal](https://data.edmonton.ca/)
1. [Vancouver Open Data Portal](https://vancouver.ca/your-government/open-data-catalogue.aspx)
1. [Toronto Open Data Portal](https://www.toronto.ca/city-government/data-research-maps/open-data/)
1. [Winnipeg Open Data Portal](https://data.winnipeg.ca/)
1. [Whitehorse Open Data](http://data.whitehorse.ca)

Note 1: Many cities today have their own open data portals, and can often be found through a Google search of "`CITY NAME` open data portal"
    
Note 2: Open data portals for the provinces and territories are still launching at the time of writing.

#### Other Governments
1. [EU Open Data](https://open-data.europa.eu/)
1. [USA](https://www.data.gov/)
1. [Australia](https://data.gov.au/)
1. [NASA](https://open.nasa.gov/open-data/)
1. [Russia](http://data.gov.ru/?language=en) (Site is in English, but naturally many of the datasets are in Russian)

Note 3: Many countries also have their own open data portals, this is not an exhaustive list. 

    
### Data Aggregators  
1. [Kaggle](https://www.kaggle.com/datasets)
1. [Open Data Soft](https://data.opendatasoft.com/pages/home/)
1. [Open Africa](https://africaopendata.org/)
1. [List of interesting data sets](https://github.com/awesomedata/awesome-public-datasets)
1. [Open Data Network](https://www.opendatanetwork.com/) (Technically speaking, this is a data set search engine)
1. [Google Public Data](https://www.google.com/publicdata/directory)

---

This is by no means an exhaustive list - if you're looking for something more specific, it's often fairly easy to find a data set relevant to you with a simple Google search!




## Working With Open Data

See our series of open data examples!

1. [Car mileage data part 1](Importing and working with open data.ipynb)
1. [Car mileage data part 2](Importing and Working With Open Data Part 2.ipynb)
1. [Lotto 649 Data](Lottery.ipynb)
1. [Meteorite Landings Part 1](Meteorite Landings.ipynb)
1. [Meteorite Landings Part 2](Meteorite Landings Part 2.ipynb)




![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg?raw=true)