# Introduction to Emissions and the Atmosphere

Emissions are gases and particles that are released into the air. They can come from a variety of sources, including cars, factories, power plants, and even natural sources like volcanoes.

When we talk about emissions, we're often talking about greenhouse gas emissions. These are gases like carbon dioxide, methane, and nitrous oxide that trap heat in the Earth's atmosphere and contribute to global warming and climate change.


![Screen%20Shot%202023-03-07%20at%202.57.45%20PM.png](attachment:Screen%20Shot%202023-03-07%20at%202.57.45%20PM.png) 

https://www.sciencelearn.org.nz/image_maps/3-carbon-cycle

#### Every year around 10 Gigatons of carbon (that’s 10 x 109 tons of carbon per year, written as GtC/yr) is emitted by combustion of fossil fuels. Where does all this carbon come from? Where does it all go? How does it get there? And, is it possible to get it there faster? Take a look at the carbon cycle above.

Now, let's talk about the atmosphere. The atmosphere is a layer of gas that surrounds the Earth. It's made up of different gases, including nitrogen, oxygen, and a small amount of other gases like carbon dioxide and methane.

The atmosphere is really important because it protects us from harmful radiation from the sun and helps regulate the temperature of the Earth. It's kind of like a blanket that keeps the Earth warm and cozy.

However, when we emit too many greenhouse gases into the atmosphere, it can cause the Earth to get too warm. This can lead to all kinds of problems, like more extreme weather events, rising sea levels, and changes to ecosystems around the world.

So, it's important to understand emissions and the atmosphere so that we can work to reduce our greenhouse gas emissions and protect our planet for future generations.

**Discussion Question:** How can we as individuals reduce our personal emissions, and what are some practical steps we can take to make a positive impact on the environment?

# Introduction to the BEACON Dataset

BEACO2N (Berkeley Environmental Air-quality & CO2 Network) is a new strategy for understanding green house gases (GHGs) and air quality at street level in near real time, giving pedestrians, companies, and policy-makers unique insight into their GHG emissions and air quality experiences. 

Through their technology, BEACO2N is able to create a highly detailed map of CO2 and pollutants in our air. The data provides a clear route to evaluating the effectiveness of local and regional efforts to reduce GHG emissions, improve air quality, improve environmental equity and reduce the detrimental effects of emissions on public health.

### Who collected the data?

BEACO2N data is collected by a network of sensors, also known as "nodes", that are deployed in various locations. The nodes are part of a collaborative effort between different organizations, including academic institutions, government agencies, and non-profit organizations. The data collected by the nodes is made available to researchers and the public for analysis and use in developing solutions for climate change mitigation and adaptation.

### How was the data collected?

BEACO2N blankets interesting locations with a network of sensors - called **"nodes"** -  approximately 1 mile (2km) apart from each other to measure green house gases and air quality. Although their individual nodes are less precise than the highly sensitive traditional sensors, when working as part of a network, our nodes create a highly detailed map of CO2 and pollutants in our air. Their nodes are sampling the air for 6 gases and also aerosol in the same locations, every minute of the day.

**CO2**, or carbon dioxide, is typically measured in parts per million (ppm) or parts per billion (ppb). This is because CO2 is a trace gas in the Earth's atmosphere, meaning it makes up only a small fraction of the gases in the air. To measure the concentration of CO2 in the atmosphere, scientists use instruments like infrared gas analyzers that can detect and quantify the amount of CO2 in a sample of air. 

### What is represented in the data?

The locations BEACO2N tracks have nodes that contain sensors for **CO2, NO, NO2, O3, CO, and aerosol** in addition to sensors for **temperature, pressure, and relative humidity**. Data from these sensors are collected once every five seconds onto a miniature computer which then sends the data to a centralized server. When combined with data from other nodes, it can be used to produce concentration maps, track pollution plumes, and to constrain calculations of emissions, to name a few possibilities.

**We will be using a dataset of CO2 emissions from the beginning of 2022 to beginning of 2023 - an entire year.**

http://beacon.berkeley.edu/about/

![Screen%20Shot%202023-03-07%20at%204.36.14%20PM.png](attachment:Screen%20Shot%202023-03-07%20at%204.36.14%20PM.png)

Click the generated link "Exploratorium Bay (2022-01-01 23:00:00-2023-01-01 23:00:00)" to see the dataset for yourself!

In [3]:
from IPython.display import IFrame
IFrame("http://beacon.berkeley.edu/about/", 900,500)

### BEACO2N CVS Dataset Explained 

#### What is a CVS?

A CSV (Comma Separated Values) file is a simple text file format used to store tabular data, which is commonly used in data science. In a CSV file, each row represents a single data record, and each column represents a specific attribute or feature of that record.

The values in a CSV file are separated by commas, which means that each comma separates a different column of data. The first row of a CSV file typically contains column headings, which describe the data in each column.

CSV files can be easily read and written by many software tools, including spreadsheet applications like Microsoft Excel, Google Sheets, or Python libraries like pandas. They are used in data science because they are a lightweight, easy-to-use format for storing and sharing large amounts of data, and they can be easily processed and analyzed by many programming languages and tools.

#### What are examples of other types of files?

Working with different file types is an essential part of data science as it involves importing, exporting, and manipulating data from various sources. Here are some commonly used file types in data science and their explanations:
1. CSV (Comma Separated Values) - CSV files are commonly used for storing tabular data where each row represents an observation and each column represents a variable. CSV files can be easily imported into various data analysis tools like Excel, R, or Python.
2. Excel - Excel files (.xlsx) are commonly used for storing data in tabular form. Excel files can be easily imported into R or Python using dedicated libraries like readxl in R or openpyxl in Python.
3. JSON (JavaScript Object Notation) - JSON files are used for storing structured data that can be easily understood by both humans and machines. JSON files are commonly used to store data that is transmitted between web applications.
4. XML (Extensible Markup Language) - XML files are used for storing structured data that can be easily understood by machines. XML files are commonly used to store data that is transmitted between web applications.
5. SQL (Structured Query Language) - SQL is used for managing relational databases. SQL files can be used to store data and query data from databases.
6. TXT (Text) - TXT files are used for storing unstructured text data. TXT files can be easily imported into R or Python using the readr package in R or the open() function in Python.

In summary, data scientists need to be familiar with different file types to be able to work with data from various sources and manipulate them for analysis.


#### What is inside the BEACO2N Dataset?

 - Each row (or line) of the file is a record of the emissions for that date and time
 - Each column is a particular variable measured 
 
 Relevant columns for our analysis:
     - **local_timestamp**: Pacific Time
     - **node_id**: Each node has been assigned an identification number
     - **CO2_ppm**: Shows the CO2 in parts per million (ppm) adjusted for standard temperature and pressure
     - **CO2_QC_level**: The quality control level of the CO2 record
 
 Averages have been calculated by taking measurements for the whole hour, then assigning them to beginning of the hour. So 12 AM will include measurements from 12:00:00-12:59:59

![Screen%20Shot%202023-03-07%20at%204.54.02%20PM.png](attachment:Screen%20Shot%202023-03-07%20at%204.54.02%20PM.png)

#### Important Terms to Describe Data

Data faithfulness refers to the degree to which data accurately represents the underlying phenomenon it purports to describe. The following terms are important to consider when discussing data faithfulness:

**Structure**: This refers to the organization of the data, including the format, schema, and relationships between different elements. Structured data is typically easier to analyze and interpret than unstructured data.

**Granularity**: This refers to the level of detail in the data. Higher granularity means more detailed data, which may be useful in certain contexts but may also increase the risk of identifying individuals or sensitive information.

**Scope**: This refers to the range of data being considered. For example, data may be limited to a specific geographic area or time period, which can impact its representativeness.

**Temporality**: This refers to the time dimension of the data, including when it was collected, how frequently it is updated, and whether it captures changes over time. Temporal consistency is important for longitudinal analyses.

**Faithfulness**: This refers to the accuracy and reliability of the data, including how it was collected, whether there were biases in the sampling or measurement processes, and whether the data accurately reflects the phenomenon being studied. High data faithfulness means that the data accurately represents the underlying reality, while low data faithfulness may lead to incorrect or biased conclusions.
