# Basic confidence intervals

1. Create the lower and upper 95% interval boundaries:
2. Create the lower boundary by subtracting 1.96 standard errors ('std_err') from the 'mean' of estimates.
3. Create the upper boundary by adding 1.96 standard errors ('std_err') to the 'mean' of estimates.
4. Pass pollutant as the faceting variable to sns.FacetGrid() and unlink the x-axes of the plots so intervals are all well-sized.
5. Pass the constructed interval boundaries to the mapped plt.hlines() function.

In [None]:
# Construct CI bounds for averages
average_ests['lower'] = average_ests['mean'] - 1.96*average_ests['std_err']
average_ests['upper'] = average_ests['mean'] + 1.96*average_ests['std_err']

# Setup a grid of plots, with non-shared x axes limits
g = sns.FacetGrid(average_ests, row = 'pollutant', sharex = False)

# Plot CI for average estimate
g.map(plt.hlines, 'y', 'lower', 'upper')

# Plot observed values for comparison and remove axes labels
g.map(plt.scatter, 'seen', 'y', color = 'orangered').set_ylabels('').set_xlabels('') 

plt.show()

![Uploading image.png]()


# Annotating confidence intervals

- Provide starting and ending limits (columns lower and upper) for your confidence intervals to plt.hlines().
- Set interval thickness to 5.
- Draw a vertical line representing a difference of 0 with plt.axvline().
- Color the null line 'orangered' to make it stand out.

In [None]:
# Set start and ends according to intervals 
# Make intervals thicker
plt.hlines(y = 'year', xmin = 'lower', xmax = 'upper', 
           linewidth = 5, color = 'steelblue', alpha = 0.7,
           data = diffs_by_year)
# Point estimates
plt.plot('mean', 'year', 'k|', data = diffs_by_year)

# Add a 'null' reference line at 0 and color orangered
plt.axvline(x = 0 , color = 'orangered', linestyle = '--')

# Set descriptive axis labels and title
plt.xlabel('95% CI')
plt.title('Avg SO2 differences between Cincinnati and Indianapolis')
plt.show()

![image-2](image-2.png)


# Making a confidence band

- Construct upper and lower 99% interval bands by adding and subtracting 2.58 standard errors from the mean.
- Make the point-estimate line white.
- Make the point-estimate line semi-transparent by setting alpha to 0.4.
- Tell plt.fill_between() what values to fill between for each day.

In [None]:
# Draw 99% inverval bands for average NO2
vandenberg_NO2['lower'] = vandenberg_NO2['mean'] - 2.58*vandenberg_NO2['std_err']
vandenberg_NO2['upper'] = vandenberg_NO2['mean'] + 2.58*vandenberg_NO2['std_err']

# Plot mean estimate as a white semi-transparent line
plt.plot('day', 'mean', data = vandenberg_NO2,
         color = 'white', alpha = 0.4)

# Fill between the upper and lower confidence band values
plt.fill_between(x = 'day', 
                 y1 = 'lower', y2 = 'upper', 
                 data = vandenberg_NO2)

plt.show()

![image-3](image-3.png)


# Separating a lot of bands

- Set up a facet grid to separate the plots by the city column in eastern_SO2.
- Send the confidence interval plotting function to map().
- Color the confidence intervals 'coral'.
- Help the overlaid mean line drawn with g.map(plt.plot,...) stand out against the confidence bands by coloring it white.

In [None]:
# Setup a grid of plots with columns divided by location
g = sns.FacetGrid(eastern_SO2, col = 'city', col_wrap = 2)

# Map interval plots to each cities data with corol colored ribbons
g.map(plt.fill_between, 'day', 'lower', 'upper', color = 'coral')

# Map overlaid mean plots with white line
g.map(plt.plot, 'day', 'mean', color = 'white')

plt.show()

![image-4](image-4.png)


# Cleaning up bands for overlaps

- Filter the SO2_compare to the for loop's currently selected city.
- Color both the intervals and mean lines with the color accompanying each city.
- Lower the interval and mean line opacities to 0.4 and 0.25, respectively.
- Override the default legend labels in plt.plot() by setting the label argument to the city name.

In [None]:
for city, color in [('Denver',"#66c2a5"), ('Long Beach', "#fc8d62")]:
    # Filter data to desired city
    city_data = SO2_compare[SO2_compare.city  ==  city]

    # Set city interval color to desired and lower opacity
    plt.fill_between(x = 'day', y1 = 'lower', y2 = 'upper', data = city_data,
                     color = '#66c2a5', alpha = 0.4)
    
    # Draw a faint mean line for reference and give a label for legend
    plt.plot('day','mean', data = city_data, label = city,
             color = '#66c2a5', alpha = 0.25)

plt.legend()
plt.show()

NameError: name 'SO2_compare' is not defined

![image-5](image-5.png)


# 90, 95, and 99% intervals

- Fill in the appropriate interval width percents (from 90,95, and 99%) according to the values list in alpha.
- In the for loop, color the interval by its assigned color.
- Pass the loop's width percentage value to plt.hlines() to label the legend.

In [None]:
# Add interval percent widths
alphas = [     0.01,  0.05,   0.1] 
widths = [ '99% CI', '95%', '90%']
colors = ['#fee08b','#fc8d59','#d53e4f']

for alpha, color, width in zip(alphas, colors, widths):
    # Grab confidence interval
    conf_ints = pollution_model.conf_int(alpha)
    
    # Pass current interval color and legend label to plot
    plt.hlines(y = conf_ints.index, xmin = conf_ints[0], xmax = conf_ints[1],
               colors = ['#fee08b','#fc8d59','#d53e4f'],  label = width, linewidth = 10) 

# Draw point estimates
plt.plot(pollution_model.params, pollution_model.params.index, 'wo', label = 'Point Estimate')

plt.legend()
plt.show() 

![image-6](image-6.png)


# 90 and 95% bands

- Set the opacity of the intervals to 40%.
- Calculate the lower and upper confidence bounds.

In [None]:
int_widths = ['90%', '99%']
z_scores = [1.67, 2.58]
colors = ['#fc8d59', '#fee08b']

for percent, Z, color in zip(int_widths, z_scores, colors):
    
    # Pass lower and upper confidence bounds and lower opacity
    plt.fill_between(
        x = cinci_13_no2.day, alpha = 0.4, color = color,
        y1 = cinci_13_no2['mean'] + 40*cinci_13_no2['std_err'],
        y2 = cinci_13_no2['mean'] - 40*cinci_13_no2['std_err'],
        label = percent)
    
plt.legend()
plt.show()

![image-7](image-7.png)


# Using band thickness instead of coloring

- Use a thickness of 15 for 90%, 10 for 95%, and 5 for 99% interval lines.
- Pass the interval thickness value to plt.hlines().
- Set the interval color to 'gray' to lighten contrast.

In [None]:
# Decrase interval thickness as interval widens
sizes =      [  15, 10, 5]
int_widths = ['90% CI', '95%', '99%']
z_scores =   [    1.67,  1.96,  2.58]

for percent, Z, size in zip(int_widths, z_scores, sizes):
    plt.hlines(y = rocket_model.pollutant, 
               xmin = rocket_model['est'] - Z*rocket_model['std_err'],
               xmax = rocket_model['est'] + Z*rocket_model['std_err'],
               label = percent, 
               # Resize lines and color them gray
               linewidth =  [  15, 10, 5], 
               color = 'gray') 
    
# Add point estimate
plt.plot('est', 'pollutant', 'wo', data = rocket_model, label = 'Point Estimate')
plt.legend(loc = 'center left', bbox_to_anchor = (1, 0.5))
plt.show()

![image-8](image-8.png)


# The bootstrap histogram

- Provide the percentile() function with the upper and lower percentiles needed to get a 95% interval.
- Shade the background of the plot in the 95% interval.
- Draw histogram of bootstrap means with 100 bins.

In [None]:
cinci_may_NO2 = pollution.query("city  ==  'Cincinnati' & month  ==  5").NO2

# Generate bootstrap samples
boot_means = bootstrap(cinci_may_NO2, 1000)

# Get lower and upper 95% interval bounds
lower, upper = np.percentile(boot_means, [2.5,97.5])

# Plot shaded area for interval
plt.axvspan(  lower, upper, color = 'gray', alpha = 0.2)

# Draw histogram of bootstrap samples
sns.distplot(boot_means, bins = 100, kde = False)

plt.show()

![image-9](image-9.png)


# Bootstrapped regressions 



In [None]:
sns.lmplot('NO2', 'SO2', data = no2_so2_boot,
           # Tell seaborn to a regression line for each sample
           hue = 'sample', 
           # Make lines blue and transparent
           line_kws = {'color': 'steelblue', 'alpha': 0.2},
           # Disable built-in confidence intervals
           ci = None , legend = False, scatter = False)

# Draw scatter of all points
plt.scatter('NO2', 'SO2', data = no2_so2)

plt.show()

![image-10](image-10.png)


# Lots of bootstraps with beeswarms


- Run bootstrap resampling on each city_NO2 vector.
- Add city name as a column in the bootstrap DataFrame, cur_boot.
- Color all swarm plot points 'coral' to avoid the color-size problem.


In [None]:
# Initialize a holder DataFrame for bootstrap results
city_boots = pd.DataFrame()

for city in ['Cincinnati', 'Des Moines', 'Indianapolis', 'Houston']:
    # Filter to city
    city_NO2 = pollution_may[pollution_may.city  ==  city].NO2
    # Bootstrap city data & put in DataFrame
    cur_boot = pd.DataFrame({'NO2_avg': bootstrap(city_NO2, 100), 'city': city})
    # Append to other city's bootstraps
    city_boots = pd.concat([city_boots,cur_boot])

# Beeswarm plot of averages with citys on y axis
sns.swarmplot(y = "city", x = "NO2_avg", data = city_boots, color = 'coral')

plt.show()

![image-11](image-11.png)
