In these projects I either pursued an interesting line of inquiry or devised a tool I thought would be useful. They were built with R, Python, MySQL, Excel VBA, Javascript, jQuery, or some combination of them, and skills learned in courses taken on the edX or Coursera platforms.
Inspired by the SI yearbooks for college and pro sports, the project documents the performance of Puerto Rico's college volleyball players throughout the United States. Maps were created using R, while the document itself, to a vast extent, was created using the Python API for Scribus, a desktop publishing software.
This notebook uses SAS and PROC SQL to visualize the performance of the Florida Retirement System pension fund, one of the largest in the United States. The data is from the Center for Retirement Research at Boston College, which tracks 180 pension funds across the United States. The performance of the fund is also compared to those of other large funds.
This notebook uses SAS to compare the grammatical proficiency of English learners in the United States to that of learners in other countries of the Anglosphere. The dataset came from three Boston-area professors who collected and analyzed data from more than 600,000 people who took an online English grammar quiz.
Interactive visualization of the graduation rates of all public high schools in Puerto Rico, across regions, districts, and cities, using R and Shiny server.
An interactive web app created using ~69,000 records from Puerto Rico's Open Data Portal, corresponding to students admitted to the UPR campuses over a 5-year period, using R and Shiny server. The app helps educators and future applicants visualize the qualifications of students admitted to a given UPR campus in a given year. The app also illustrates the most popular majors among male and female students, as well as the most selective ones overall. In addition, the most frequent high schools of provenance of those admitted to a campus and the top performing schools in Puerto Rico are also graphed.
Tableau visualization of Orlando real estate sales by zipcode, using data provided by the Orlando Regional Realtor Association.
Tableau visualization of 20 years of Orlando real estate sales, using data provided by the Orlando Regional Realtor Association.
This is the final project for Python for Data Journalists: Analyzing Money in Politics, a course offered by the Knight Center for Journalism in the Americas. The project summarizes and plots the sources and amount of funding received by the various ballot measures from the November 8, 2016 election in California, as well as the voting results, using Python, Jupyter, pandas, numpy, and matplotlib.
Heatmaps and boxplots dashboard for an L.A. County employee salaries dataset that includes employees’ salaries and benefits for the years 2013-2015, using R, the rbokeh package, and Shiny server.
A whimsical look at the 1980 Major League Baseball season using MySQL, R, and the 2016 Lahman database, which has baseball data going back to 1871. In 1980, baseball was a big deal.
A visualization of the median salaries of recent graduates of about 170 majors, and the degree of women's participation in each major, using Python, Jupyter, pandas, numpy, and matplotlib.
An exploration and visualization of who shops the Black Friday sales on Thanksgiving Day, using Python, Jupyter, pandas, numpy, and matplotlib.
What lookup functions and pivot tables in spreadsheets can do for us and their equivalents in R, using R notebook, Excel, MySQL (RMySQL), and XAMPP.
Using Afrobarometer's 2016 poll data and Excel VBA to gain some insight into Africans' view on China.
Used Microsoft SQL Server, Power BI, and R to explore and visualize insurance premium data from 2014 to 2019 downloaded from Healthcare.gov's data website. A SQL Server was set up and Transact SQL queries were run against it to extract the relevant data, which was then visualized using Power BI and R.
These interactive maps plot police activity within a given radius of a location. The user can specify types of incidents, days of week, and times of day to refine results. Each of them also displays density maps, faceted bar plots, and contingency tables. They were put together using using R and Shiny server, while the data was pre-processed using Python.
- Orlando police calls map (2009-2015, 3 million records)
- Puerto Rico crime map (2012-2015, 220,000 records)
- Los Angeles crime map (2004-2015, 2 million records)
- Chicago crime map (2001-2016, 14 million records)
- San Francisco crime map (2003-2016, 2 million records)
Tableau visualization of this Kaggle dataset that includes video game sales of some 16,000 video released between the 1980's and 2016.
This notebook describes the pre-processing, using R notebook, applied to a dataset that would eventually be used in the Orlando police calls map.
These statistical inference projects were done using R notebook.
An R notebook that infers the true average number of hours worked by Americans, based on the 2016 General Social Survey.
An R notebook that infers the true proportion of Americans working full time, based on the 2016 General Social Survey.
An R notebook that infers the true difference in mean self-ranking between two populations: Americans who voted for Mitt Romney in the 2012 presidential elections, and those who voted for Barack Obama.
An R notebook that makes inferences about the true difference in proportion of gun ownership between two populations: Americans who don’t live within a 1-mile radius of an area they fear, and Americans who do.
Devising a strategy to invest in Prosper loans using logistic regression, R notebook, and the caret and ROCR packages.
Notebook: Predicting the severity of mammography assessments with decision trees using R notebook and the rpart package
Visualizing and classifying tweets using Support Vector Machines via R notebook and the tm, SnowballC, wordcloud and e1071 packages.
A colorful 3rd-order PLL design tool in Python/Javascript. Computes loop filter components’ values, plots open- and closed-loop responses and output-referred noise plots, computes RMS phase and frequency errors and jitter, plots time response, and computes various lock times. Plots and tabulates extensive results to web page or, alternatively, generates complete Excel report for download and further computations. The app is available in Simplified Chinese as well. 体验一下吧. The app is hosted on Google App Engine. In the backend, the app uses Python and the modules numpy, xlrd, and xlwt, and the Jinja2 templating engine. In the front end, it uses the HTML5 stack: HTML, CSS, and Javascript, plus Google Charts.
A vibrant Smith chart impedance matching tool hosted on Google App Engine and using jQuery and HTML Canvas that aids designers to match a given impedance ZL at a given frequency to a given characteristic impedance Zo. Computes equivalent input impedance and reflection coefficient amplitude and phase and plots on the Smith Chart. Can use Z, Y, or ZY Smith charts.