This repo contains all of the practical exercises I did during the Data Analytics Bootcamp @ Ironhack in Mexico City. The entire part time + 400 hours course lasted for 6 months (Sept 2020 - March 2021). It was divided into 3 modules:
Module 1 - Data Extraction & Transformation;
Module 2 - Data Analysis & Visualization;
Module 3 - Data Modeling.
In the table below constituts an index of all exercises ("labs") grouped by bootcamp module and week, which contains a link to the exercises, the programming language, libraries used and the main topics covered or methods used by me to solve the problems.
Module | Lab | Language | Libraries | Topics/Methods |
---|---|---|---|---|
M1 | resolving-git-conflicts | Git, Command Line, Bash | - | GitHub, add, commit, push, pull, merge, conflicts, pull requests |
M1 | tuple-set-dict | Python | random, operator, pandas | random.sample, operator.itemgetter, pd.DataFrame |
M1 | list-comprehensions | Python | os, numpy, pandas | os.listdir, os.path.join, pd.concat,np.array, _get_numeric_data |
M1 | string-operations | Python | re, math | f-strings, str.lower, str.endswith, str.join, str.split, str.replace, re.findall, re.search, bag of words |
M1 | advanced-regex-expressions | Python | re | re.findall, re.sub |
M1 | lambda-functions | Python | - | functions, lambda, zip, sorted, dict.items |
M1 | numpy | Python | numpy, | np.random (random, rand, sample), np.ones, size, shape, np.reshape, np.transpose, np.array_equal, max, min, mean, np.empty, np.nditer, |
M1 | functions | Python | iter | functions, iterators, generators, yield |
M1 | intro-pandas | Python | pandas, numpy | pd.Series, pd.DataFrame, df.columns, subsetting, df.mean, df.max, df.median, df.sum |
M1 | error-handling | Python | math | try-except, if-else, functions |
M1 | object-oriented-programming | Python | objects, functions | |
M1 | map-reduce-filter | Python | numpy, pandas, functools | functions, map, reduce, filter |
M1 | import-export | Python | pandas | pd.read_csv, pd.to_csv, pd.read_excel, df.head, df.value_counts |
M1 | dataframe-calculations | Python | pandas, numpy, zipfile | df.shape, df.unique, str.contains, df.astype, df.isnull, df.apply, df.sort_values, df.equals, pd.get_dummies, df.corr, df.drop, pd.groupby.agg, df.quantile, |
M1 | my-sql-select | SQL | - | aliases, inner join, left join, sum, coalesce, |
M1 | my-sql | SQL | - | db design, table relationships, db seeding, forward engineering schemas, one-to-many, many-to-one, many-to-many, linking tables |
M1 | advanced-mysql | SQL | - | temporary tables, subqueries, permanent tables |
M1 | mongo db | MongoDB | - | - |
M1 | web-scraping | Python, APIs | requests, beautifulsoup, tweepy | requests.get, requests.get.content, BeautifulSoup, soup.find_all, soup.tag.text, soup.tag.get, soup.tag.find, tweepy.get_user, tweepy.user_timeline, tweepy.user.statuses_count, tweepy.user.follower_count |
M1 | web-scraping-deep dive | Python, APIs | requests, beautifulsoup, tweepy | requests.get, requests.get.content, BeautifulSoup, soup.find_all, soup.tag.text, soup.tag.get, soup.tag.find, tweepy.get_user, tweepy.user_timeline, tweepy.user.statuses_count, tweepy.user.follower_count |
M1 | parsing-api | Python, APIs | requests, pandas | requests.get, requests.get.content |
M1 | api-scavenger | Python, APIs, Command Line | pandas, pandas.io.json | curl, pd.read_json, json_normalize, pd.to_datetime |
M1 | parsing-rss-feeds | Python | pandas, feedparser | feedparser.parse |
M1 | data-cleaning | Python | pandas, sqlalchemy, pymysql | create_engine, pd.read_sql_query |
M2 | subsetting-and-descriptive-stats | Python | pandas, matplotlib, seaborn | df.loc, df.groupby.agg, df.quantile, df.describe, random.choice, plt.hist, plt.vlines, np.mean, np.std |
M2 | df-calculation-and-transformation | Python | pandas, matplotlib | pd.get_dummies, pd.concat, pd.corr |
M2 | pandas-deep-dive | Python | pandas | df.describe, df.groupby.agg, df.apply |
M2 | intro-to-scipy | Python | scipy, numpy | stats.tmean, stats.fisher_exact, scipy.interpolate, interpolate.interp1d, np.arange |
M2 | pivot-table-and-correlation | Python | pandas, scipy.stats | df.pivot_table(index, columns, aggfunc), stats.linregress, plt.scatter, stats.pearsonr, stats.speamanr |
M2 | matplotlib-seaborn | Python | matplotlib.pyplot, seaborn, numpy, pandas | plt.plot, plt.show, plt.subplots, plt.legend, plt.bar, plt.barh, plt.pie, plt.boxplot, plt.xticks, ax.set_title, ax.set_xlabel, sns.set, sns.distplot, sns.barplot, sns.despine, sns.violinplot, sns.catplot, sns.heatmap, np.linspace, pd.select_dtypes, pd.Categorical, df.cat.codes, np.triu, sns.diverging_palette |
M2 | plotting-multiple-data-series | Python | matplotlib.pyplot, seaborn, numpy, pandas | pd.groupby().sum().plot(), pd. groupby().mean().plot(), pd.pivot_table() |
M2 | introduction-to-powerbi-and-tableau | Tableau, PowerBI | - | - |
M2 | tableau | Tableau | - | - |
M2 | discrete-probability-distribution | Python | scipy.stats, numpy | stats.binom, stats.poisson |
M2 | continuous-probability-distribution | Python | scipy.stats, numpy | stats.uniform, stats.norm, stats.expon, np.random.exponential, stats.rvs, stats.cdf, stats.pdf, stats.ppf |
M2 | calculating-odds | Python | scipy.stats, numpy | comb |
M2 | hypothesis-testing | Python | scipy.stats, numpy, pandas, statsmodels | stats.ttest_1samp, stats.sem, stats.t.interval, pd.crosstab, statsmodels.proportions_ztest |
M2 | two-sample-hypothesis-tests | Python | pandas, scipy.stats | stats.ttest_ind, stats.ttest_rel, stats.ttest_1samp, stats.chi2_contingency, np.where |
M2 | correlation-tests-with-scipy | Python | pandas, scipy.stats, statsmodels.api | statsmodels.api.stats.anova_lm |
M2 | regression-analysis | Python | numpy, pandas, scipy, sklearn.linear_model, matplotlib, seaborn | plt.scatter, df.corr, scipy.stats.linregress, sns.heatmap, sklearn.LinearRegression, lm.fit, lm.score, lm.coef_, lm.intercept |
M2 | bayesian-statistics | Python | pandas, numpy, matplotlib | - |
M2 | principal-component-analysis | Python | pandas, numpy, statsmodels.multivariate.pca, sklearn.preprocessing | sklearn.preprocessing.StandardScaler, PCA |
M2 | time series analysis | Python | pandas, numpy, pandas.plotting, statsmodels.tsa.stattools, statsmodels.tsa.arima_model, statsmodels.tools.eval_measures | statsmodels.api.tsa.seasonal_decompose |
M2 | introduction-to-recommender-systems | Python | pandas, numpy, scipy.spatial.distance | - |
M2 | survival-analysis | Python | pandas, numpy, chart_studio.plotly, cufflinks | lifelines.KaplanMeierFitter |
M3 | introduction-to-machine-learning | Python | pandas, numpy, datetime, sklearn.model_selection, sklearn.linear_model, sklearn.pipeline, sklearn.metrics, sklearn.preprocessing, feature_engine.encoding, feature_engine, feature_engine.discretisation, datetime | pd.to_numeric, df.interpolate, np.where, dt.strptime, dt.toordinal, train_test_split |
M3 | introduction-to-sklearn | Python | pandas, sklearn.linear_model, sklearn.datasets, sklearn.preprocessing, sklearn.model_selection, statsmodels.api, sklearn.metrics, sklearn.feature_selection | LinearRegression, load_diabetes, PolynomialFeatures, StandardScaler, train_test_split, sm.OLS, r2_score, RFE |
M3 | supervised-learning | Python | pandas, seaborn, sklearn.model_selection, sklearn.linear_model, LogisticRegression, sklearn.neighbors, sklearn.preprocessing | df.corr, sns.heatmap, df.drop, df.dropna, pd.get_dummies, train_test_split, LogisticRegression, confusion_matrix, accuracy_score, KNeighborsClassifier, RobustScaler |
M3 | supervised-classification | Python | pandas, numpy, matplotlib, sklearn.model_selection, sklearn.linear_model, sklearn.tree, sklearn.neighbors, sklearn.naive_bayes, sklearn.metrics, sklearn.ensemble, sklearn.svm, sklearn.multi_class | train_test_split, LogisticRegression, KNeighborsClassifier, DecisionTreeClassifier, GaussianNB, RandomForestClassifier, LinearSVC, OneVsOneClassifier |
M3 | supervised-model-evaluation | Python | pandas, sklearn.model_selection, sklearn.linear_model, sklearn.metrics, sklearn.neighbors | train_test_split, LinearRegression, LogisticRegression, KNeighborsClassifier |
M3 | sklearn-and-unsupervised-learning | Python | pandas, numpy, matplotlib, sklearn.preprocessing, sklearn.cluster, mpl_toolkits.mplot3d | LabelEncoder, KMeans, fig.gca(projection='3d') |
M3 | unsupervised-learning | Python | pandas, numpy, matplotlib, sklearn.preprocessing, sklearn.cluster, sklearn.metrics | StandardScaler, KMeans, DBSCAN |
M3 | unsupervised-learning-deep dive | Python | pandas, numpy, matplotlib, math, sklearn.preprocessing, sklearn.cluster, scipy.cluster.hierarchy, hdbscan | pd.get_dummies, np.percentile, StandardScaler, KMeans |
M3 | unsupervised-learning-evaluation | Python | pandas, numpy, matplotlib, seaborn, sklearn.cluster, sklearn.manifold, yellowbrick.cluster, sklearn.decomposition | KElbowVisualizer, AgglomerativeClustering, PCA, KMeans, TSNE |