GitHub - Nov11/cs186-2013: simpleDB

Original Course link:

(CS186)Intro to Database Systems - Fall 2013

About building

Alter 'sourceversion' in build.xml to '1.9' to make Ant using JDK 9.

Progress

project 1:

[x] Tuple and TupleDesc
[x] Catalog
[x] BufferPool
[x] HeapFile access method
[x] Operators

project 2:

[x] Filter and Join
[x] Aggregates
[x] HeapFile Mutability
[x] Insertion and deletion
[x] Page eviction

project 3:

[x] Filter Selectivity
[x] Join Cardinality
[x] Join Ordering

project 4:

[x] Granting Locks & Lock Lifetime
[x] Implementing NO STEAL
[x] Transactions
[x] Deadlocks and Aborts

modified test cases on project 4

testHeapFileScanWithManyPages: change transaction id passed to seqscan from null to a local variable transaction id. as back to project 2, there's no concept of transaction and the param is not used.
grabLock wrap assertNull with if clause on expected == true. if it's expected to be false, the later one could possibly produce some error message. if it's expected to be true, then it should not.

deadlock detection and aborts is implemented in a very naive way by using timeout.

it cant pass transaction system test, but takes several minutes to finish. guess win10 and ubuntu differ in some way so that the naive version cannot work on ubuntu. I made some improvement.

request on shared lock and exclusive lock will be aborted when timed out.
when there's already a transaction waiting on a write lock:
- a transaction with higher tid which is considered younger aborts itself.
- a transaction with lower tid which is considered older and have completed much more work place itself as the waiting transaction with the lowest tid and wake waiting thread on this lock
- on wake up, if a transaction sees that its tid is greater than the lowest one, it aborts itself

so that every time concurrent threads acquire a shared lock and wait on an exclusive lock on the same page, the one with lowest tid will proceed.

but the wait time is tricky to pick. if too many transactions aborts and it times out before the last one finishes aborting, this won't work. there's no guarantee on delay but should be ok to proceed. Personally I prefer deadlock avoidance wait-die / wounded-wait to deadlock detection.

this is the answer for project 3 exercise 1

exercise 1:

simpledb.Parser.handleQueryStatement():

Get a logical plan from SQL parser by calling parseQueryLogicalPlan.
Get a physical plan base on previous logical plan.

simpledb.Parser.parseQueryLogicalPlan():

Iterate through from clause and set table name & alias in logical plan.
Deal with where clause. Determine a join or filter on types of operands of a operator and make corresponding sub node in logical plan.
Extract group by a field.
Iterate through select clause and add projection fields to logical plan. If an aggregate is encountered, extract field id and store infos in logical plan.
If there is an order by clause, extract and add the field and ASC/DESC order in logical plan.

simpledb.LogicalPlan.physicalPlan():

Iterate through base tables and
- make a sequential scan for each table alias.
- make an entry of table statistics
- make an entry of filter selectivity
Iterate through filters of this logical plan
- convert filter info into a predicate for execution usage
- compute selectivity of the table on value specified in filter
- update selectivity of the used table by multiple up with original one
Iterate through query plan which is generated by join optimizer
- use the two iterators in logical plan node to construct join operator
- substitute the sub plan on the left hand side with the new join operator
- remove right hand side operator if it's not a sub query
Iterate through select list
- if it's an aggregate operation, add an int field in output fields
- on aggregate but no operation, add an field that has "group by" param's type
- add fields and their corresponding types if a 'null.*' wildcards shows up.
- look up field id and type in tuple description and add them to output
Makes aggregate node if needed
Makes order by node if needed

how a physical plan is made in short:

sequential scan(table access stage)
filter and join (make a tree structure of operators)
projection,aggregates and order by(deal with final output)

Notes:

When catalog.txt and data.data is placed in project top level of folder. The command line can't read table file due to path issue. Catalog.java used getParent to retrieve parent directory of catalog.txt if invoked by parser. If parser gets a relative path without parent directory, the original code results in path that begins with 'null'. A fix is to use no parent path if the return value is null.

below is for project 2:

SimpleDB>  select d.f1, d.f2 from data d;
 select d.f1, d.f2 from data d;
Started a new transaction tid = 0
Added scan of table d
Added select list field d.f1
Added select list field d.f2
d.f1	d.f2	
------------------
1 10

2 20

3 30

4 40

5 50

5 50


 6 rows.
Transaction 0 committed.
----------------
10.03 seconds

SimpleDB>

Contest(Optional)

Added an in-memory hash join for equal predicate. It is similar to GRACE join but doesn't write out buckets as both hash tables fits in memory at the same time.

SELECT p.title
FROM papers p
WHERE p.title LIKE 'selectivity';

Started a new transaction tid = 2
Added scan of table p
Added select list field p.title
p.title	
------------
Optimizing ethanol production selectivity.

Development of feedforward receptive field structure of a simple cell and its contribution to the orientation selectivity: a mod

Influences of formant bandwidth and auditory frequency selectivity on identification of place of articulation in stop consonants

A theoretical entropy score as a single value to express inhibitor selectivity.

ASH structure alignment package: Sensitivity and selectivity in domain classification.


 5 rows.
Transaction 2 committed.
----------------
0.26 seconds

SELECT p.title, v.name
FROM papers p, authors a, paperauths pa, venues v
WHERE a.name = 'E. F. Codd'
AND pa.authorid = a.id
AND pa.paperid = p.id
AND p.venueid = v.id;

Added table : authors with schema INT_TYPE(id),STRING_TYPE(name)
Added table : venues with schema INT_TYPE(id),STRING_TYPE(name),INT_TYPE(year),INT_TYPE(type)
Added table : papers with schema INT_TYPE(id),STRING_TYPE(title),INT_TYPE(venueid)
Added table : paperauths with schema INT_TYPE(paperid),INT_TYPE(authorid)
Computing table stats.
Done.
SimpleDB> SELECT p.title, v.name
FROM papers p, authors a, paperauths pa, venues v
WHERE a.name = 'E. F. Codd'
AND pa.authorid = a.id
AND pa.paperid = p.id
AND p.venueid = v.id;SELECT p.title, v.name
SimpleDB> FROM papers p, authors a, paperauths pa, venues v
SimpleDB> WHERE a.name = 'E. F. Codd'
SimpleDB> AND pa.authorid = a.id
SimpleDB> AND pa.paperid = p.id
SimpleDB> AND p.venueid = v.id;

Started a new transaction tid = 0
Added scan of table p
Added scan of table a
Added scan of table pa
Added scan of table v
Added join between pa.authorid and a.id
Added join between pa.paperid and p.id
Added join between p.venueid and v.id
Added select list field p.title
Added select list field v.name
p.title	v.name	
-----------------------
Further Normalization of the Data Base Relational Model. IBM Research Report  San Jose  California

Universal  Relation Fails to Replace Relational Model (letter to the editor). IEEE Software

Interactive Support for Non-Programmers: The Relational and Network Approaches. IBM Research Report  San Jose  California

RENDEZVOUS Version 1: An Experimental English Language Query Formulation System for Casual Users of Relational Data Bases. IBM Research Report

Data Base Sublanguage Founded on the Relational Calculus. IBM Research Report  San Jose  California

Relational Completeness of Data Base Sublanguages. In: R. Rustin (ed.): Database Systems: 65-98  Prentice Hall and IBM Research Report RJ 987  San Jose  California

Derivability  Redundancy and Consistency of Relations Stored in Large Data Banks. IBM Research Report  San Jose  California

The Capabilities of Relational Database Management Systems. IBM Research Report  San Jose  California

Seven Steps to Rendezvous with the Casual User. IFIP Working Conference Data Base Management

Normalized Data Base Structure: A Brief Tutorial. IBM Research Report  San Jose  California

The Gamma-0 n-ary Relational Data Base Interface Specifications of Objects and Operations. IBM Research Report


 11 rows.
Transaction 0 committed.
----------------
1.07 seconds

SimpleDB>

SELECT a2.name, count(p.id)
FROM papers p, authors a1, authors a2, paperauths pa1, paperauths pa2
WHERE a1.name = 'Michael Stonebraker'
AND pa1.authorid = a1.id 
AND pa1.paperid = p.id 
AND pa2.authorid = a2.id 
AND pa1.paperid = pa2.paperid
GROUP BY a2.name
ORDER BY a2.name;

Added table : authors with schema INT_TYPE(id),STRING_TYPE(name)
Added table : venues with schema INT_TYPE(id),STRING_TYPE(name),INT_TYPE(year),INT_TYPE(type)
Added table : papers with schema INT_TYPE(id),STRING_TYPE(title),INT_TYPE(venueid)
Added table : paperauths with schema INT_TYPE(paperid),INT_TYPE(authorid)
Computing table stats.
Done.
SimpleDB> SELECT a2.name, count(p.id)
SimpleDB> FROM papers p, authors a1, authors a2, paperauths pa1, paperauths pa2
SimpleDB> WHERE a1.name = 'Michael Stonebraker'
SimpleDB> AND pa1.authorid = a1.id 
SimpleDB> AND pa1.paperid = p.id 
SimpleDB> AND pa2.authorid = a2.id 
SimpleDB> AND pa1.paperid = pa2.paperid
SimpleDB> GROUP BY a2.name
SimpleDB> ORDER BY a2.name;
Started a new transaction tid = 0
Added scan of table p
Added scan of table a1
Added scan of table a2
Added scan of table pa1
Added scan of table pa2
Added join between pa1.authorid and a1.id
Added join between pa1.paperid and p.id
Added join between pa2.authorid and a2.id
Added join between pa1.paperid and pa2.paperid
GROUP BY FIELD : a2.name
Added select list field a2.name
Aggregate field is p.id, agg fun is : count
Added select list field p.id
	 with aggregator count
a2.name	aggName(count)(p.id)	
-------------------------------------
Akhil Kumar 1

Dale Skeen 1

Eric N. Hanson 1

Lawrence A. Rowe 1

Michael Hirohama 1

Michael Stonebraker 8

Spyros Potamianos 1


 7 rows.
Transaction 0 committed.
----------------
7.65 seconds

SimpleDB>

below is for project 3 with standard data set 0.01

select d.fname, d.lname
from Actor a, Casts c, Movie_Director m, Director d
where a.id=c.pid and c.mid=m.mid and m.did=d.id 
and a.fname='John' and a.lname='Spicer';

Added table : Actor with schema INT_TYPE(id),STRING_TYPE(fname),STRING_TYPE(lname),STRING_TYPE(gender)
Added table : Movie with schema INT_TYPE(id),STRING_TYPE(name),INT_TYPE(year)
Added table : Director with schema INT_TYPE(id),STRING_TYPE(fname),STRING_TYPE(lname)
Added table : Casts with schema INT_TYPE(pid),INT_TYPE(mid),STRING_TYPE(role)
Added table : Movie_Director with schema INT_TYPE(did),INT_TYPE(mid)
Added table : Genre with schema INT_TYPE(mid),STRING_TYPE(genre)
Computing table stats.
Done.
Explain mode enabled.
SimpleDB> select d.fname, d.lname
SimpleDB> from Actor a, Casts c, Movie_Director m, Director d
SimpleDB> where a.id=c.pid and c.mid=m.mid and m.did=d.id 
SimpleDB> and a.fname='John' and a.lname='Spicer';
Started a new transaction tid = 6
Added scan of table a
Added scan of table c
Added scan of table m
Added scan of table d
Added join between a.id and c.pid
Added join between c.mid and m.mid
Added join between m.did and d.id
Added select list field d.fname
Added select list field d.lname
[d:m, m:c, a:c]
PATH SO FAR = [d:m]
PATH SO FAR = [m:c, d:m]
PATH SO FAR = [m:c, d:m, a:c]
The query plan is:
                            π(d.fname,d.lname),card:1
                            |
                            ⨝(a.id=c.pid),card:1
  __________________________|___________________________
  |                                                    |
  σ(a.lname=Spicer),card:1                             ⨝(m.mid=c.mid),card:29729
  |                                    ________________|_________________
  σ(a.fname=John),card:1               |                                |
  |                                    ⨝(d.id=m.did),card:2791          |
  |                           _________|_________                       |
  |                           |                 |                     scan(Casts c)
scan(Actor a)               scan(Director d)  scan(Movie_Director m)

d.fname d.lname 
------------------------

 0 rows.
Transaction 6 committed.
----------------
2.42 seconds

SimpleDB>

select d.fname, d.lname
from Actor a, Casts c, Movie_Director m, Director d, Movie mv
where a.id=c.pid and c.mid=m.mid and m.did=d.id and mv.id = m.did
and a.fname='John' and a.lname='Spicer' and mv.year > 2011;

SimpleDB> select d.fname, d.lname
SimpleDB> from Actor a, Casts c, Movie_Director m, Director d, Movie mv
SimpleDB> where a.id=c.pid and c.mid=m.mid and m.did=d.id and mv.id = m.did
SimpleDB> and a.fname='John' and a.lname='Spicer' and mv.year > 2011;
Started a new transaction tid = 12
Added scan of table a
Added scan of table c
Added scan of table m
Added scan of table d
Added scan of table mv
Added join between a.id and c.pid
Added join between c.mid and m.mid
Added join between m.did and d.id
Added join between mv.id and m.did
Added select list field d.fname
Added select list field d.lname
[d:m, m:c, a:c, mv:m]
PATH SO FAR = [d:m]
PATH SO FAR = [m:c, d:m]
PATH SO FAR = [m:c, d:m, a:c]
PATH SO FAR = [mv:m, m:c, d:m, a:c]
The query plan is:
                            π(d.fname,d.lname),card:1
                            |
                            ⨝(mv.id=m.did),card:1
  __________________________|__________________________
  |                                                   |
  σ(mv.year>2011),card:1                              ⨝(a.id=c.pid),card:1
  |                         __________________________|___________________________
  |                         |                                                    |
  |                         σ(a.lname=Spicer),card:1                             ⨝(m.mid=c.mid),card:29729
  |                         |                                    ________________|_________________
  |                         σ(a.fname=John),card:1               |                                |
  |                         |                                    ⨝(d.id=m.did),card:2791          |
  |                         |                           _________|_________                       |
scan(Movie mv)              |                           |                 |                     scan(Casts c)
                          scan(Actor a)               scan(Director d)  scan(Movie_Director m)

d.fname d.lname 
------------------------

 0 rows.
Transaction 12 committed.
----------------
0.64 seconds

SimpleDB>

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
dblp_data		dblp_data
lib		lib
src/java/simpledb		src/java/simpledb
test/simpledb		test/simpledb
.classpath		.classpath
.gitignore		.gitignore
.project		.project
.travis.yml		.travis.yml
build.xml		build.xml
catalog.txt		catalog.txt
data.dat		data.dat
data.txt		data.txt
readme.md		readme.md
simpledb.iml		simpledb.iml
some_data_file.dat		some_data_file.dat
some_data_file.txt		some_data_file.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Original Course link:

About building

Progress

project 1:

project 2:

project 3:

project 4:

modified test cases on project 4

deadlock detection and aborts is implemented in a very naive way by using timeout.

this is the answer for project 3 exercise 1

Notes:

below is for project 2:

Contest(Optional)

below is for project 3 with standard data set 0.01

About

Releases

Packages

Languages

Nov11/cs186-2013

Folders and files

Latest commit

History

Repository files navigation

Original Course link:

About building

Progress

project 1:

project 2:

project 3:

project 4:

modified test cases on project 4

deadlock detection and aborts is implemented in a very naive way by using timeout.

this is the answer for project 3 exercise 1

Notes:

below is for project 2:

Contest(Optional)

below is for project 3 with standard data set 0.01

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages